Statistical Models of Word Frequency and Other Count Data

Size: px
Start display at page:

Download "Statistical Models of Word Frequency and Other Count Data"

Transcription

1 Statistical Models of Word Frequency and Other Count Data Martin Jansche

2 Motivation Item counts are commonly used in NLP as independent variables in many applications: information retrieval, topic detection and tracking, text categorization, among many others. Generative models are in widespread use. Such models make predictions about the distribution of word counts, word and document lengths, etc. Parametric models are equally widespread. Their assumptions need to be checked against data.

3 What this talk is about Accurately modeling discrete properties of text documents, such as length, word frequencies, etc. Focus on word frequency in documents. Claim 1: In addition to overdispersion, variation of word frequency across documents is largely due to zero-inflation. Claim 2: Modeling zero-inflation is often preferable to modeling overdispersion.

4 What this talk is not about Estimating the probabilities of unseen words. Instead, focus on words that occur zero times in most documents (true of most words!), but do occur a few times in a small number of documents. State-of-the-art text classification. Text classification using an independent feature model is used merely for illustration, since it is simple and benefits from richer models for individual features.

5 Parametric models Encode all properties of a distribution in (typically) very few parameters. Easy to incorporate prior information about plausible values for parameters. Can work with very small amounts of data. Can work with sparse data. Often closed form expressions are available for moments, probabilities, percentiles, etc.

6 Linguistic count data Focus on modeling document length and word frequency in documents. Sample sizes are often small: most words are extremely rare and most documents are fairly short. Overdispersion: natural variation not well captured by simple models with very few parameters. Zero-inflation: most words occur zero times in a given document; not captured by standard models.

7 Claim 1 Overdispersed models can capture increased variance of token frequency [Mosteller and Wallace 1964, 1984; Church and Gale 1995]. Zero-inflation accounts for variation not captured by overdispersed models. Need to develop a zero-inflated extension of a robust, overdispersed model of token frequency. Zero-inflation can be observed in M&W s data.

8 Poisson family models Start with the Poisson distribution with rate λ > 0: Poisson(λ)(x) = λx x! exp( λ). A natural generalization of the Poisson is the Negative Binomial distribution, with an additional parameter κ > 0 that controls non-poissonness: NegBin(λ, κ)(x) = λx x! κ κ (λ + κ) κ Γ(κ + x) Γ(κ)(λ + κ) x

9 Poisson(λ) NegBin(λ, κ) Mean µ λ = λ Variance σ 2 λ λ (1 + λ/κ) Skewness γ 1 1 λ 1 λ Kurtosis γ 2 1 λ 1 λ κ + 2λ κ (λ + κ) κ λ + κ + 6 κ Mode λ λ (1 1/κ) if κ 1 0 if 0 < κ < 1

10 Example: Subject lines of spam frequency (number of messages) observed Poisson(38.35) NegBin(38.35, 5.52) length of subject line (number of characters)

11 Detailed examples Mosteller and Wallace s [1964, 1984] data, taken from The Federalist papers. Essays by Alexander Hamilton and James Madison (and John Jay) on the shape of the proposed US constitution. M&W sampled approx. 250 contiguous passages of equal length for each of the two main authors.

12 Some words follow the Poisson frequency (number of passages) observed Poisson(0.67) occurrences of "any" [Hamilton]

13 Some words follow the Neg. Binomial frequency (number of passages) observed Poisson(0.45) NegBin(0.45, 1.17) occurrences of "were" [Madison]

14 And some words are special For example his (Hamilton and Madison pooled) in Mosteller and Wallace s data. The method of maximum likelihood leads to NegBin(0.54, 0.15). Here s what that model has to say: obsrvd expctd

15 Alternatively, we could have estimated the parameters based on: (a) the number of documents with zero occurrences of his ; and (b) the number of documents with one occurrence of his. Not surprisingly, the resulting model, NegBin(0.76, 0.11), is worse: obsrvd expctd

16 frequency (number of passages) observed NegBin(0.54, 0.15) NegBin(0.76, 0.11) 0.34 NegBin(1.56, 0.89) occurrences of "his" [Hamilton, Madison]

17 Adaptation, burstiness and all that Church [2000]: The first mention of a word obviously depends on frequency, but surprisingly, the second does not. Adaptation [the degree to which the probability of a word encountered in recent context is increased] depends more on lexical content than frequency[.] Church, concerned mostly with empirical exploration, used nonparametric methods. How can his findings be incorporated into a parametric setting?

18 A modest proposal Whether a given word appears at all in a document is one thing. How often it appears, if it does, is another thing. Not all words are appropriate in a given context (taboo words, technical jargon, proper names). A writer s/speaker s active vocabulary is limited and idiosyncratic ( (tom/pot)atos / (tom/pot)atoes ). We insist on capturing non-zero occurrences with parametric models, but treat zeroes specially.

19 frequency (number of passages) observed NegBin(0.54, 0.15) NegBin(0.76, 0.11) 0.34 NegBin(1.56, 0.89) occurrences of "his" [Hamilton, Madison]

20 A concrete modest proposal Two-component mixture: first component is a degenerate distribution at zero (or possibly a geometric distribution starting at zero); second component a standard distribution F with parameter vector θ, e. g. from the Poisson or Binomial family. ZIF(z, θ)(x) = z(x 0) + (1 z)f(θ)(x) where 0 z 1 (z < 0 may be allowable).

21 Properties of ZIF If F(θ) has mean µ and variance σ 2, then ZIF(z, θ) has mean (1 z) µ and variance (1 z) (σ 2 + z µ 2 ). Furthermore, ZIF(z, θ) has the same modes as F(θ) plus potentially an additional mode at zero.

22 Zero-inflated distributions Straightforward interpretation of generative process: pretend there is a z-biased coin; flip coin; on heads, generate 0; on tails, generate according to F. If parameter vector θ of F can be estimated straightforwardly, use EM to estimate z and θ. Otherwise use multidimenisional maximization algorithms.

23 ZINB model for his Recall that a NegBin model can already account for the fact that most of the probability mass is concentrated at zero. Can a zero-inflated NegBin (ZINB) model do better? Note that the maximum likelihood models for the distribution of his in M&W s data say very different things, even though the net effects may be superficially similar.

24 The NegBin model claims that his occurs much less than once on average (0.54 expected occurrences) and that it has large variance. The ZINB model claims that his occurs in only a third of all passages, but within those its expected number of occurrences is 1.56 and its variance is less than that predicted by the NegBin model.

25 frequency (number of passages) observed NegBin(0.54, 0.15) NegBin(0.76, 0.11) 0.34 NegBin(1.56, 0.89) occurrences of "his" [Hamilton, Madison]

26 NegBin ZINB obsrvd expctd expctd χ 2 q-value log L(ˆθ)

27 Comparison of Poisson models x Poisson(λ) µ = λ σ 2 = λ = µ x NegBin(λ, κ) x ZIPoisson(z, λ) µ = λ, µ = (1 z) λ, σ 2 = λ (1 + λ κ ) σ2 = µ (1 + z λ) x ZINegBin(z, λ, κ)

28 Comparison of Binomial models x n Binom(p) µ = n p σ 2 = n p (1 p) = µ q x n BetaBin(p, γ) x n ZIBinom(z, p) µ = n p µ = (1 z) n p σ 2 = µ q (1 + (n 1)γ) σ 2 = µ (q + z n p) x n ZIBetaBin(z, p, γ)

29 The Naive Bayes classifier We would like to have a distribution over a random variable C (class labels) conditional on independent variables X 1,..., X k and parameters θ: P (C X 1,..., X k ; θ) P (C, X 1,..., X k θ) Assume a graphical model where the only edges are from C to X i for i = 1,..., k. In other words: P (C, X 1,..., X k θ) = P (C θ) k P (X i C; θ) i=1

30 For document classification, the independent variables X i range over counts. In addition, we can condition on the document length L. For example: P (X i = x L = n, C = j; θ) ( ) n = (θ ij ) x (1 θ ij ) (n x) x Training consists of finding a point estimate of θ. Classification is done by selecting the most probable class, conditional on the values of the independent variables and the estimated ˆθ.

31 Effects on classification performance McCallum and Nigam [1998] compared multivariate Bernoulli and multinomial models. We compare (joint independent) Bernoulli, binomial, betabinomial, and zero-inflated binomial models. Bernoulli model can be interpreted as binning (nonparametric historgram method) into two dominant classes: zero and nonzero. Zero-inflated binomial should be able to combine advantages of Bernoulli and detail of binomial model.

32 naive standard Poisson 1 Negative Binomial 2 Binomial 1 Beta-Binomial 2 Multinomial k Dirichlet-Multinomial k+1 McCallum and Nigam recommended Bernoulli for small vocabulary sizes; we recommend ZIBinomial.

33 Newsgroups data set 20 Newsgroups data set, stratified so that all classes are equally likely a priori, therefore 5% baseline accuracy.

34 Newsgroups 90 classification accuracy (percent) k 2k Bernoulli Binomial ZIBinom BetaBin 5k 10k 20k vocabulary size (number of word types)

35 Binom ZIB McNemar

36 WebKB data set Web pages from CS departments, classified as faculty, student, course, and project pages documents total.

37 classification accuracy (percent) Bernoulli Binomial BetaBin ZIBinom WebKB k 2k 5k 10k vocabulary size (number of word types) 20k

38 Claim 2 Zero-inflated models perform no worse than overdispersed models. Standard zero-inflated models are easier to work with, since EM can be used for parameter estimation. Modeling zero-inflation is preferable to modeling overdispersion, at least for Naive Bayes document classification.

39 Longer documents Tom in Project Gutenberg books (15k 25k words). No surprises initially: But the tail is very long:

40 Document lengths Document length in newsgroup data is non-negative, heavily skewed to the right, and seems to be unimodal (unlike newswire). Approximated well by log-logistic density: LogLogistic(µ, σ, δ)(x) = σ ) δ 1 δ ( x µ σ [ 1 + ( ) ] x µ δ 2 σ CDF easy to invert (unlike log-normal), pth

41 percentile point is: µ + σ ( p ) 1/δ 1 p Leave µ fixed, estimate remaining two parameters from tertile points: ˆσ = t 1 µ t 2 µ ˆδ = 2 log 2 log(t 2 µ) log(t 1 µ)

42 frequency (number of documents) observed LogLogistic document length (number of words)

43 Language modeling Traditionally done using Markov chains. For example, a bigram model over {a, b} : a b b a a a b b

44 Word length Markov chains are poor models of word length. As a model of word length, a bigram model degenerates to a (shifted) geometric distribution: q p 1-p 1-q

45 Pascal distribution The geometric distribution can be generalized to the Pascal distribution, which is a special case of the Negative Binomial with κ an integer. NegBin(λ, κ)(x) = λx = κ κ Γ(κ + x) x! (λ + κ) κ Γ(κ)(λ + κ) ( ) ( ) x x ( x + κ 1 λ κ x λ + κ λ + κ ) κ

46 Reparametrize as follows: ( ) x + κ 1 Pascal(p, κ)(x) = x p x (1 p) κ For example, when κ = 4: p p p p 1-p 1-p 1-p 1-p

47 Word length in the NETtalk data Modeled by a slight variant of a Pascal model: m 1 p1 1-p1 p2 1-p2 p3 1-p3 1 1-m p p p p p 1-p 1-p 1-p 1-p 1-p With κ fixed (in this case κ = 5), we can use EM to estimate the remaining parameters.

48 frequency (number of words) observed geometric Pascal word length (number of letters)

49 Conclusions Especially for parametric models, need to check goodness of fit. Overdispersion and zero-inflation are common in count data encountered in NLP. This affects our choice of models. For example, exponential family models misleadingly known as maximum entropy models have no provisions for overdispersion. Use with caution.

Chapter 3 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2013 John Wiley & Sons, Inc.

Chapter 3 Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2013 John Wiley & Sons, Inc. 1 3.1 Describing Variation Stem-and-Leaf Display Easy to find percentiles of the data; see page 69 2 Plot of Data in Time Order Marginal plot produced by MINITAB Also called a run chart 3 Histograms Useful

More information

PRE CONFERENCE WORKSHOP 3

PRE CONFERENCE WORKSHOP 3 PRE CONFERENCE WORKSHOP 3 Stress testing operational risk for capital planning and capital adequacy PART 2: Monday, March 18th, 2013, New York Presenter: Alexander Cavallo, NORTHERN TRUST 1 Disclaimer

More information

Binomial Approximation and Joint Distributions Chris Piech CS109, Stanford University

Binomial Approximation and Joint Distributions Chris Piech CS109, Stanford University Binomial Approximation and Joint Distributions Chris Piech CS109, Stanford University Four Prototypical Trajectories Review The Normal Distribution X is a Normal Random Variable: X ~ N(µ, s 2 ) Probability

More information

Some Characteristics of Data

Some Characteristics of Data Some Characteristics of Data Not all data is the same, and depending on some characteristics of a particular dataset, there are some limitations as to what can and cannot be done with that data. Some key

More information

Probability Distributions: Discrete

Probability Distributions: Discrete Probability Distributions: Discrete Introduction to Data Science Algorithms Jordan Boyd-Graber and Michael Paul SEPTEMBER 27, 2016 Introduction to Data Science Algorithms Boyd-Graber and Paul Probability

More information

Some Discrete Distribution Families

Some Discrete Distribution Families Some Discrete Distribution Families ST 370 Many families of discrete distributions have been studied; we shall discuss the ones that are most commonly found in applications. In each family, we need a formula

More information

Chapter 5. Statistical inference for Parametric Models

Chapter 5. Statistical inference for Parametric Models Chapter 5. Statistical inference for Parametric Models Outline Overview Parameter estimation Method of moments How good are method of moments estimates? Interval estimation Statistical Inference for Parametric

More information

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii) Contents (ix) Contents Preface... (vii) CHAPTER 1 An Overview of Statistical Applications 1.1 Introduction... 1 1. Probability Functions and Statistics... 1..1 Discrete versus Continuous Functions... 1..

More information

Random Variables Handout. Xavier Vilà

Random Variables Handout. Xavier Vilà Random Variables Handout Xavier Vilà Course 2004-2005 1 Discrete Random Variables. 1.1 Introduction 1.1.1 Definition of Random Variable A random variable X is a function that maps each possible outcome

More information

MVE051/MSG Lecture 7

MVE051/MSG Lecture 7 MVE051/MSG810 2017 Lecture 7 Petter Mostad Chalmers November 20, 2017 The purpose of collecting and analyzing data Purpose: To build and select models for parts of the real world (which can be used for

More information

CS340 Machine learning Bayesian model selection

CS340 Machine learning Bayesian model selection CS340 Machine learning Bayesian model selection Bayesian model selection Suppose we have several models, each with potentially different numbers of parameters. Example: M0 = constant, M1 = straight line,

More information

CS 237: Probability in Computing

CS 237: Probability in Computing CS 237: Probability in Computing Wayne Snyder Computer Science Department Boston University Lecture 10: o Cumulative Distribution Functions o Standard Deviations Bernoulli Binomial Geometric Cumulative

More information

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi Chapter 4: Commonly Used Distributions Statistics for Engineers and Scientists Fourth Edition William Navidi 2014 by Education. This is proprietary material solely for authorized instructor use. Not authorized

More information

The Bernoulli distribution

The Bernoulli distribution This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

6. Genetics examples: Hardy-Weinberg Equilibrium

6. Genetics examples: Hardy-Weinberg Equilibrium PBCB 206 (Fall 2006) Instructor: Fei Zou email: fzou@bios.unc.edu office: 3107D McGavran-Greenberg Hall Lecture 4 Topics for Lecture 4 1. Parametric models and estimating parameters from data 2. Method

More information

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali

Contents Part I Descriptive Statistics 1 Introduction and Framework Population, Sample, and Observations Variables Quali Part I Descriptive Statistics 1 Introduction and Framework... 3 1.1 Population, Sample, and Observations... 3 1.2 Variables.... 4 1.2.1 Qualitative and Quantitative Variables.... 5 1.2.2 Discrete and Continuous

More information

EE641 Digital Image Processing II: Purdue University VISE - October 29,

EE641 Digital Image Processing II: Purdue University VISE - October 29, EE64 Digital Image Processing II: Purdue University VISE - October 9, 004 The EM Algorithm. Suffient Statistics and Exponential Distributions Let p(y θ) be a family of density functions parameterized by

More information

A First Course in Probability

A First Course in Probability A First Course in Probability Seventh Edition Sheldon Ross University of Southern California PEARSON Prentice Hall Upper Saddle River, New Jersey 07458 Preface 1 Combinatorial Analysis 1 1.1 Introduction

More information

Statistics 6 th Edition

Statistics 6 th Edition Statistics 6 th Edition Chapter 5 Discrete Probability Distributions Chap 5-1 Definitions Random Variables Random Variables Discrete Random Variable Continuous Random Variable Ch. 5 Ch. 6 Chap 5-2 Discrete

More information

1. You are given the following information about a stationary AR(2) model:

1. You are given the following information about a stationary AR(2) model: Fall 2003 Society of Actuaries **BEGINNING OF EXAMINATION** 1. You are given the following information about a stationary AR(2) model: (i) ρ 1 = 05. (ii) ρ 2 = 01. Determine φ 2. (A) 0.2 (B) 0.1 (C) 0.4

More information

Exam STAM Practice Exam #1

Exam STAM Practice Exam #1 !!!! Exam STAM Practice Exam #1 These practice exams should be used during the month prior to your exam. This practice exam contains 20 questions, of equal value, corresponding to about a 2 hour exam.

More information

Probability Models.S2 Discrete Random Variables

Probability Models.S2 Discrete Random Variables Probability Models.S2 Discrete Random Variables Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard Results of an experiment involving uncertainty are described by one or more random

More information

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016 Probability Theory Probability and Statistics for Data Science CSE594 - Spring 2016 What is Probability? 2 What is Probability? Examples outcome of flipping a coin (seminal example) amount of snowfall

More information

Chapter 5: Statistical Inference (in General)

Chapter 5: Statistical Inference (in General) Chapter 5: Statistical Inference (in General) Shiwen Shen University of South Carolina 2016 Fall Section 003 1 / 17 Motivation In chapter 3, we learn the discrete probability distributions, including Bernoulli,

More information

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is: **BEGINNING OF EXAMINATION** 1. You are given: (i) A random sample of five observations from a population is: 0.2 0.7 0.9 1.1 1.3 (ii) You use the Kolmogorov-Smirnov test for testing the null hypothesis,

More information

Chapter 7: Point Estimation and Sampling Distributions

Chapter 7: Point Estimation and Sampling Distributions Chapter 7: Point Estimation and Sampling Distributions Seungchul Baek Department of Statistics, University of South Carolina STAT 509: Statistics for Engineers 1 / 20 Motivation In chapter 3, we learned

More information

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved.

4-1. Chapter 4. Commonly Used Distributions by The McGraw-Hill Companies, Inc. All rights reserved. 4-1 Chapter 4 Commonly Used Distributions 2014 by The Companies, Inc. All rights reserved. Section 4.1: The Bernoulli Distribution 4-2 We use the Bernoulli distribution when we have an experiment which

More information

Lecture III. 1. common parametric models 2. model fitting 2a. moment matching 2b. maximum likelihood 3. hypothesis testing 3a. p-values 3b.

Lecture III. 1. common parametric models 2. model fitting 2a. moment matching 2b. maximum likelihood 3. hypothesis testing 3a. p-values 3b. Lecture III 1. common parametric models 2. model fitting 2a. moment matching 2b. maximum likelihood 3. hypothesis testing 3a. p-values 3b. simulation Parameters Parameters are knobs that control the amount

More information

Chapter 7: Estimation Sections

Chapter 7: Estimation Sections 1 / 40 Chapter 7: Estimation Sections 7.1 Statistical Inference Bayesian Methods: Chapter 7 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods:

More information

Statistical Analysis of Text

Statistical Analysis of Text Statistical Analysis of Text Statistical text analysis has a long history in literary analysis and in solving disputed authorship problems First (?) is Thomas C. Mendenhall in 1887 Mendenhall Mendenhall

More information

discussion Papers Some Flexible Parametric Models for Partially Adaptive Estimators of Econometric Models

discussion Papers Some Flexible Parametric Models for Partially Adaptive Estimators of Econometric Models discussion Papers Discussion Paper 2007-13 March 26, 2007 Some Flexible Parametric Models for Partially Adaptive Estimators of Econometric Models Christian B. Hansen Graduate School of Business at the

More information

Commonly Used Distributions

Commonly Used Distributions Chapter 4: Commonly Used Distributions 1 Introduction Statistical inference involves drawing a sample from a population and analyzing the sample data to learn about the population. We often have some knowledge

More information

ECON 214 Elements of Statistics for Economists 2016/2017

ECON 214 Elements of Statistics for Economists 2016/2017 ECON 214 Elements of Statistics for Economists 2016/2017 Topic The Normal Distribution Lecturer: Dr. Bernardin Senadza, Dept. of Economics bsenadza@ug.edu.gh College of Education School of Continuing and

More information

Computational Statistics Handbook with MATLAB

Computational Statistics Handbook with MATLAB «H Computer Science and Data Analysis Series Computational Statistics Handbook with MATLAB Second Edition Wendy L. Martinez The Office of Naval Research Arlington, Virginia, U.S.A. Angel R. Martinez Naval

More information

Cambridge University Press Risk Modelling in General Insurance: From Principles to Practice Roger J. Gray and Susan M.

Cambridge University Press Risk Modelling in General Insurance: From Principles to Practice Roger J. Gray and Susan M. adjustment coefficient, 272 and Cramér Lundberg approximation, 302 existence, 279 and Lundberg s inequality, 272 numerical methods for, 303 properties, 272 and reinsurance (case study), 348 statistical

More information

Lecture 10: Point Estimation

Lecture 10: Point Estimation Lecture 10: Point Estimation MSU-STT-351-Sum-17B (P. Vellaisamy: MSU-STT-351-Sum-17B) Probability & Statistics for Engineers 1 / 31 Basic Concepts of Point Estimation A point estimate of a parameter θ,

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics March 12, 2018 CS 361: Probability & Statistics Inference Binomial likelihood: Example Suppose we have a coin with an unknown probability of heads. We flip the coin 10 times and observe 2 heads. What can

More information

Basic Procedure for Histograms

Basic Procedure for Histograms Basic Procedure for Histograms 1. Compute the range of observations (min. & max. value) 2. Choose an initial # of classes (most likely based on the range of values, try and find a number of classes that

More information

Estimation Parameters and Modelling Zero Inflated Negative Binomial

Estimation Parameters and Modelling Zero Inflated Negative Binomial CAUCHY JURNAL MATEMATIKA MURNI DAN APLIKASI Volume 4(3) (2016), Pages 115-119 Estimation Parameters and Modelling Zero Inflated Negative Binomial Cindy Cahyaning Astuti 1, Angga Dwi Mulyanto 2 1 Muhammadiyah

More information

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions.

UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions. UQ, STAT2201, 2017, Lectures 3 and 4 Unit 3 Probability Distributions. Random Variables 2 A random variable X is a numerical (integer, real, complex, vector etc.) summary of the outcome of the random experiment.

More information

Probability and Statistics

Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 3: PARAMETRIC FAMILIES OF UNIVARIATE DISTRIBUTIONS 1 Why do we need distributions?

More information

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی یادگیري ماشین توزیع هاي نمونه و تخمین نقطه اي پارامترها Sampling Distributions and Point Estimation of Parameter (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی درس هفتم 1 Outline Introduction

More information

Conjugate priors: Beta and normal Class 15, Jeremy Orloff and Jonathan Bloom

Conjugate priors: Beta and normal Class 15, Jeremy Orloff and Jonathan Bloom 1 Learning Goals Conjugate s: Beta and normal Class 15, 18.05 Jeremy Orloff and Jonathan Bloom 1. Understand the benefits of conjugate s.. Be able to update a beta given a Bernoulli, binomial, or geometric

More information

Lecture 17: More on Markov Decision Processes. Reinforcement learning

Lecture 17: More on Markov Decision Processes. Reinforcement learning Lecture 17: More on Markov Decision Processes. Reinforcement learning Learning a model: maximum likelihood Learning a value function directly Monte Carlo Temporal-difference (TD) learning COMP-424, Lecture

More information

A Stochastic Reserving Today (Beyond Bootstrap)

A Stochastic Reserving Today (Beyond Bootstrap) A Stochastic Reserving Today (Beyond Bootstrap) Presented by Roger M. Hayne, PhD., FCAS, MAAA Casualty Loss Reserve Seminar 6-7 September 2012 Denver, CO CAS Antitrust Notice The Casualty Actuarial Society

More information

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract Basic Data Analysis Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, 2013 Abstract Introduct the normal distribution. Introduce basic notions of uncertainty, probability, events,

More information

Generating Random Numbers

Generating Random Numbers Generating Random Numbers Aim: produce random variables for given distribution Inverse Method Let F be the distribution function of an univariate distribution and let F 1 (y) = inf{x F (x) y} (generalized

More information

Modeling dynamic diurnal patterns in high frequency financial data

Modeling dynamic diurnal patterns in high frequency financial data Modeling dynamic diurnal patterns in high frequency financial data Ryoko Ito 1 Faculty of Economics, Cambridge University Email: ri239@cam.ac.uk Website: www.itoryoko.com This paper: Cambridge Working

More information

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples 1.3 Regime switching models A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples (or regimes). If the dates, the

More information

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018 ` Subject CS1 Actuarial Statistics 1 Core Principles Syllabus for the 2019 exams 1 June 2018 Copyright in this Core Reading is the property of the Institute and Faculty of Actuaries who are the sole distributors.

More information

Dependence Structure and Extreme Comovements in International Equity and Bond Markets

Dependence Structure and Extreme Comovements in International Equity and Bond Markets Dependence Structure and Extreme Comovements in International Equity and Bond Markets René Garcia Edhec Business School, Université de Montréal, CIRANO and CIREQ Georges Tsafack Suffolk University Measuring

More information

SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS

SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS Questions 1-307 have been taken from the previous set of Exam C sample questions. Questions no longer relevant

More information

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR Nelson Mark University of Notre Dame Fall 2017 September 11, 2017 Introduction

More information

MATH 3200 Exam 3 Dr. Syring

MATH 3200 Exam 3 Dr. Syring . Suppose n eligible voters are polled (randomly sampled) from a population of size N. The poll asks voters whether they support or do not support increasing local taxes to fund public parks. Let M be

More information

Random Variables and Probability Distributions

Random Variables and Probability Distributions Chapter 3 Random Variables and Probability Distributions Chapter Three Random Variables and Probability Distributions 3. Introduction An event is defined as the possible outcome of an experiment. In engineering

More information

Probability Distributions: Discrete

Probability Distributions: Discrete Probability Distributions: Discrete INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber FEBRUARY 19, 2017 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions:

More information

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI

KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI 88 P a g e B S ( B B A ) S y l l a b u s KARACHI UNIVERSITY BUSINESS SCHOOL UNIVERSITY OF KARACHI BS (BBA) VI Course Title : STATISTICS Course Number : BA(BS) 532 Credit Hours : 03 Course 1. Statistical

More information

PROBABILITY. Wiley. With Applications and R ROBERT P. DOBROW. Department of Mathematics. Carleton College Northfield, MN

PROBABILITY. Wiley. With Applications and R ROBERT P. DOBROW. Department of Mathematics. Carleton College Northfield, MN PROBABILITY With Applications and R ROBERT P. DOBROW Department of Mathematics Carleton College Northfield, MN Wiley CONTENTS Preface Acknowledgments Introduction xi xiv xv 1 First Principles 1 1.1 Random

More information

EX-POST VERIFICATION OF PREDICTION MODELS OF WAGE DISTRIBUTIONS

EX-POST VERIFICATION OF PREDICTION MODELS OF WAGE DISTRIBUTIONS EX-POST VERIFICATION OF PREDICTION MODELS OF WAGE DISTRIBUTIONS LUBOŠ MAREK, MICHAL VRABEC University of Economics, Prague, Faculty of Informatics and Statistics, Department of Statistics and Probability,

More information

Statistics for Managers Using Microsoft Excel 7 th Edition

Statistics for Managers Using Microsoft Excel 7 th Edition Statistics for Managers Using Microsoft Excel 7 th Edition Chapter 5 Discrete Probability Distributions Statistics for Managers Using Microsoft Excel 7e Copyright 014 Pearson Education, Inc. Chap 5-1 Learning

More information

Statistics & Flood Frequency Chapter 3. Dr. Philip B. Bedient

Statistics & Flood Frequency Chapter 3. Dr. Philip B. Bedient Statistics & Flood Frequency Chapter 3 Dr. Philip B. Bedient Predicting FLOODS Flood Frequency Analysis n Statistical Methods to evaluate probability exceeding a particular outcome - P (X >20,000 cfs)

More information

Probability Weighted Moments. Andrew Smith

Probability Weighted Moments. Andrew Smith Probability Weighted Moments Andrew Smith andrewdsmith8@deloitte.co.uk 28 November 2014 Introduction If I asked you to summarise a data set, or fit a distribution You d probably calculate the mean and

More information

Lecture 3: Probability Distributions (cont d)

Lecture 3: Probability Distributions (cont d) EAS31116/B9036: Statistics in Earth & Atmospheric Sciences Lecture 3: Probability Distributions (cont d) Instructor: Prof. Johnny Luo www.sci.ccny.cuny.edu/~luo Dates Topic Reading (Based on the 2 nd Edition

More information

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel

Point Estimators. STATISTICS Lecture no. 10. Department of Econometrics FEM UO Brno office 69a, tel STATISTICS Lecture no. 10 Department of Econometrics FEM UO Brno office 69a, tel. 973 442029 email:jiri.neubauer@unob.cz 8. 12. 2009 Introduction Suppose that we manufacture lightbulbs and we want to state

More information

Session 5. A brief introduction to Predictive Modeling

Session 5. A brief introduction to Predictive Modeling SOA Predictive Analytics Seminar Malaysia 27 Aug. 2018 Kuala Lumpur, Malaysia Session 5 A brief introduction to Predictive Modeling Lichen Bao, Ph.D A Brief Introduction to Predictive Modeling LICHEN BAO

More information

SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS

SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS SOCIETY OF ACTUARIES/CASUALTY ACTUARIAL SOCIETY EXAM C CONSTRUCTION AND EVALUATION OF ACTUARIAL MODELS EXAM C SAMPLE QUESTIONS Copyright 2008 by the Society of Actuaries and the Casualty Actuarial Society

More information

An Information Based Methodology for the Change Point Problem Under the Non-central Skew t Distribution with Applications.

An Information Based Methodology for the Change Point Problem Under the Non-central Skew t Distribution with Applications. An Information Based Methodology for the Change Point Problem Under the Non-central Skew t Distribution with Applications. Joint with Prof. W. Ning & Prof. A. K. Gupta. Department of Mathematics and Statistics

More information

Alternative VaR Models

Alternative VaR Models Alternative VaR Models Neil Roeth, Senior Risk Developer, TFG Financial Systems. 15 th July 2015 Abstract We describe a variety of VaR models in terms of their key attributes and differences, e.g., parametric

More information

Negative binomial distribution - Wikipedia, the fr...

Negative binomial distribution - Wikipedia, the fr... Negative binomial distribution From Wikipedia, the free encyclopedia In probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes

More information

Lean Six Sigma: Training/Certification Books and Resources

Lean Six Sigma: Training/Certification Books and Resources Lean Si Sigma Training/Certification Books and Resources Samples from MINITAB BOOK Quality and Si Sigma Tools using MINITAB Statistical Software A complete Guide to Si Sigma DMAIC Tools using MINITAB Prof.

More information

An Improved Skewness Measure

An Improved Skewness Measure An Improved Skewness Measure Richard A. Groeneveld Professor Emeritus, Department of Statistics Iowa State University ragroeneveld@valley.net Glen Meeden School of Statistics University of Minnesota Minneapolis,

More information

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions. ME3620 Theory of Engineering Experimentation Chapter III. Random Variables and Probability Distributions Chapter III 1 3.2 Random Variables In an experiment, a measurement is usually denoted by a variable

More information

Analysis of truncated data with application to the operational risk estimation

Analysis of truncated data with application to the operational risk estimation Analysis of truncated data with application to the operational risk estimation Petr Volf 1 Abstract. Researchers interested in the estimation of operational risk often face problems arising from the structure

More information

Estimation after Model Selection

Estimation after Model Selection Estimation after Model Selection Vanja M. Dukić Department of Health Studies University of Chicago E-Mail: vanja@uchicago.edu Edsel A. Peña* Department of Statistics University of South Carolina E-Mail:

More information

CSE 312 Winter Learning From Data: Maximum Likelihood Estimators (MLE)

CSE 312 Winter Learning From Data: Maximum Likelihood Estimators (MLE) CSE 312 Winter 2017 Learning From Data: Maximum Likelihood Estimators (MLE) 1 Parameter Estimation Given: independent samples x1, x2,..., xn from a parametric distribution f(x θ) Goal: estimate θ. Not

More information

The normal distribution is a theoretical model derived mathematically and not empirically.

The normal distribution is a theoretical model derived mathematically and not empirically. Sociology 541 The Normal Distribution Probability and An Introduction to Inferential Statistics Normal Approximation The normal distribution is a theoretical model derived mathematically and not empirically.

More information

Appendix A. Selecting and Using Probability Distributions. In this appendix

Appendix A. Selecting and Using Probability Distributions. In this appendix Appendix A Selecting and Using Probability Distributions In this appendix Understanding probability distributions Selecting a probability distribution Using basic distributions Using continuous distributions

More information

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom Review for Final Exam 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom THANK YOU!!!! JON!! PETER!! RUTHI!! ERIKA!! ALL OF YOU!!!! Probability Counting Sets Inclusion-exclusion principle Rule of product

More information

Chapter 7: Estimation Sections

Chapter 7: Estimation Sections Chapter 7: Estimation Sections 7.1 Statistical Inference Bayesian Methods: 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions Frequentist Methods: 7.5 Maximum Likelihood Estimators

More information

Chapter 7: Estimation Sections

Chapter 7: Estimation Sections 1 / 31 : Estimation Sections 7.1 Statistical Inference Bayesian Methods: 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods: 7.5 Maximum Likelihood

More information

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER Two hours MATH20802 To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER STATISTICAL METHODS Answer any FOUR of the SIX questions.

More information

SYLLABUS OF BASIC EDUCATION SPRING 2018 Construction and Evaluation of Actuarial Models Exam 4

SYLLABUS OF BASIC EDUCATION SPRING 2018 Construction and Evaluation of Actuarial Models Exam 4 The syllabus for this exam is defined in the form of learning objectives that set forth, usually in broad terms, what the candidate should be able to do in actual practice. Please check the Syllabus Updates

More information

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon. Chapter 14: random variables p394 A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon. Consider the experiment of tossing a coin. Define a random variable

More information

ADVANCED OPERATIONAL RISK MODELLING IN BANKS AND INSURANCE COMPANIES

ADVANCED OPERATIONAL RISK MODELLING IN BANKS AND INSURANCE COMPANIES Small business banking and financing: a global perspective Cagliari, 25-26 May 2007 ADVANCED OPERATIONAL RISK MODELLING IN BANKS AND INSURANCE COMPANIES C. Angela, R. Bisignani, G. Masala, M. Micocci 1

More information

Modelling strategies for bivariate circular data

Modelling strategies for bivariate circular data Modelling strategies for bivariate circular data John T. Kent*, Kanti V. Mardia, & Charles C. Taylor Department of Statistics, University of Leeds 1 Introduction On the torus there are two common approaches

More information

Describing Uncertain Variables

Describing Uncertain Variables Describing Uncertain Variables L7 Uncertainty in Variables Uncertainty in concepts and models Uncertainty in variables Lack of precision Lack of knowledge Variability in space/time Describing Uncertainty

More information

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics.

Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Week 1 Variables: Exploration, Familiarisation and Description. Descriptive Statistics. Convergent validity: the degree to which results/evidence from different tests/sources, converge on the same conclusion.

More information

STA 532: Theory of Statistical Inference

STA 532: Theory of Statistical Inference STA 532: Theory of Statistical Inference Robert L. Wolpert Department of Statistical Science Duke University, Durham, NC, USA 2 Estimating CDFs and Statistical Functionals Empirical CDFs Let {X i : i n}

More information

Lecture 2. Probability Distributions Theophanis Tsandilas

Lecture 2. Probability Distributions Theophanis Tsandilas Lecture 2 Probability Distributions Theophanis Tsandilas Comment on measures of dispersion Why do common measures of dispersion (variance and standard deviation) use sums of squares: nx (x i ˆµ) 2 i=1

More information

A New Hybrid Estimation Method for the Generalized Pareto Distribution

A New Hybrid Estimation Method for the Generalized Pareto Distribution A New Hybrid Estimation Method for the Generalized Pareto Distribution Chunlin Wang Department of Mathematics and Statistics University of Calgary May 18, 2011 A New Hybrid Estimation Method for the GPD

More information

PROBABILITY AND STATISTICS

PROBABILITY AND STATISTICS Monday, January 12, 2015 1 PROBABILITY AND STATISTICS Zhenyu Ye January 12, 2015 Monday, January 12, 2015 2 References Ch10 of Experiments in Modern Physics by Melissinos. Particle Physics Data Group Review

More information

TABLE OF CONTENTS - VOLUME 2

TABLE OF CONTENTS - VOLUME 2 TABLE OF CONTENTS - VOLUME 2 CREDIBILITY SECTION 1 - LIMITED FLUCTUATION CREDIBILITY PROBLEM SET 1 SECTION 2 - BAYESIAN ESTIMATION, DISCRETE PRIOR PROBLEM SET 2 SECTION 3 - BAYESIAN CREDIBILITY, DISCRETE

More information

CFA Level I - LOS Changes

CFA Level I - LOS Changes CFA Level I - LOS Changes 2018-2019 Topic LOS Level I - 2018 (529 LOS) LOS Level I - 2019 (525 LOS) Compared Ethics 1.1.a explain ethics 1.1.a explain ethics Ethics Ethics 1.1.b 1.1.c describe the role

More information

CFA Level I - LOS Changes

CFA Level I - LOS Changes CFA Level I - LOS Changes 2017-2018 Topic LOS Level I - 2017 (534 LOS) LOS Level I - 2018 (529 LOS) Compared Ethics 1.1.a explain ethics 1.1.a explain ethics Ethics 1.1.b describe the role of a code of

More information

Binomial Random Variables. Binomial Random Variables

Binomial Random Variables. Binomial Random Variables Bernoulli Trials Definition A Bernoulli trial is a random experiment in which there are only two possible outcomes - success and failure. 1 Tossing a coin and considering heads as success and tails as

More information

ESTIMATION OF MODIFIED MEASURE OF SKEWNESS. Elsayed Ali Habib *

ESTIMATION OF MODIFIED MEASURE OF SKEWNESS. Elsayed Ali Habib * Electronic Journal of Applied Statistical Analysis EJASA, Electron. J. App. Stat. Anal. (2011), Vol. 4, Issue 1, 56 70 e-issn 2070-5948, DOI 10.1285/i20705948v4n1p56 2008 Università del Salento http://siba-ese.unile.it/index.php/ejasa/index

More information

Numerical Descriptions of Data

Numerical Descriptions of Data Numerical Descriptions of Data Measures of Center Mean x = x i n Excel: = average ( ) Weighted mean x = (x i w i ) w i x = data values x i = i th data value w i = weight of the i th data value Median =

More information

Market Risk Analysis Volume I

Market Risk Analysis Volume I Market Risk Analysis Volume I Quantitative Methods in Finance Carol Alexander John Wiley & Sons, Ltd List of Figures List of Tables List of Examples Foreword Preface to Volume I xiii xvi xvii xix xxiii

More information

Random variables. Contents

Random variables. Contents Random variables Contents 1 Random Variable 2 1.1 Discrete Random Variable............................ 3 1.2 Continuous Random Variable........................... 5 1.3 Measures of Location...............................

More information

Web Science & Technologies University of Koblenz Landau, Germany. Lecture Data Science. Statistics and Probabilities JProf. Dr.

Web Science & Technologies University of Koblenz Landau, Germany. Lecture Data Science. Statistics and Probabilities JProf. Dr. Web Science & Technologies University of Koblenz Landau, Germany Lecture Data Science Statistics and Probabilities JProf. Dr. Claudia Wagner Data Science Open Position @GESIS Student Assistant Job in Data

More information