Prediction Market Prices as Martingales: Theory and Analysis. David Klein Statistics 157

Similar documents
A Probabilistic Approach to Determining the Number of Widgets to Build in a Yield-Constrained Process

Lecture 9: Prediction markets, fair games and martingales..

Arbitrages and pricing of stock options

Laws of probabilities in efficient markets

16 MAKING SIMPLE DECISIONS

4 Martingales in Discrete-Time

Probability Models.S2 Discrete Random Variables

ME3620. Theory of Engineering Experimentation. Spring Chapter III. Random Variables and Probability Distributions.

X i = 124 MARTINGALES

N(A) P (A) = lim. N(A) =N, we have P (A) = 1.

16 MAKING SIMPLE DECISIONS

Review for Final Exam Spring 2014 Jeremy Orloff and Jonathan Bloom

Counting Basics. Venn diagrams

Lecture Data Science

Math-Stat-491-Fall2014-Notes-V

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2017

Basic Data Analysis. Stephen Turnbull Business Administration and Public Policy Lecture 4: May 2, Abstract

Mathematics of Finance Final Preparation December 19. To be thoroughly prepared for the final exam, you should

Lecture 23: April 10

Part V - Chance Variability

ECE 586GT: Problem Set 1: Problems and Solutions Analysis of static games

Math489/889 Stochastic Processes and Advanced Mathematical Finance Homework 4

Chapter 5. Sampling Distributions

Web Science & Technologies University of Koblenz Landau, Germany. Lecture Data Science. Statistics and Probabilities JProf. Dr.

STA 6166 Fall 2007 Web-based Course. Notes 10: Probability Models

Probability Models. Grab a copy of the notes on the table by the door

Chapter 11. Data Descriptions and Probability Distributions. Section 4 Bernoulli Trials and Binomial Distribution

Chapter 15, More Probability from Applied Finite Mathematics by Rupinder Sekhon was developed by OpenStax College, licensed by Rice University, and

Financial Econometrics (FinMetrics04) Time-series Statistics Concepts Exploratory Data Analysis Testing for Normality Empirical VaR

Chapter 4 and 5 Note Guide: Probability Distributions

**BEGINNING OF EXAMINATION** A random sample of five observations from a population is:

Chapter 1 Discussion Problem Solutions D1. D2. D3. D4. D5.

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

Theoretical Foundations

Discrete Random Variables and Probability Distributions. Stat 4570/5570 Based on Devore s book (Ed 8)

The normal distribution is a theoretical model derived mathematically and not empirically.

4.3 Normal distribution

TABLE OF CONTENTS - VOLUME 2

CHAPTER III CONSTRUCTION AND SELECTION OF SINGLE, DOUBLE AND MULTIPLE SAMPLING PLANS

Probability, Price, and the Central Limit Theorem. Glenn Shafer. Rutgers Business School February 18, 2002

Mechanism Design and Auctions

درس هفتم یادگیري ماشین. (Machine Learning) دانشگاه فردوسی مشهد دانشکده مهندسی رضا منصفی

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 11 10/9/2013. Martingales and stopping times II

FINAL REVIEW W/ANSWERS

6. Martingales. = Zn. Think of Z n+1 as being a gambler s earnings after n+1 games. If the game if fair, then E [ Z n+1 Z n

5. In fact, any function of a random variable is also a random variable

Reading: You should read Hull chapter 12 and perhaps the very first part of chapter 13.

Contents. An Overview of Statistical Applications CHAPTER 1. Contents (ix) Preface... (vii)

Copyright (C) 2001 David K. Levine This document is an open textbook; you can redistribute it and/or modify it under the terms of version 1 of the

Normal distribution Approximating binomial distribution by normal 2.10 Central Limit Theorem

Chapter 4: Commonly Used Distributions. Statistics for Engineers and Scientists Fourth Edition William Navidi

Chapter 9: Sampling Distributions

Discrete Random Variables and Probability Distributions

Lecture 17: More on Markov Decision Processes. Reinforcement learning

On Existence of Equilibria. Bayesian Allocation-Mechanisms

The Binomial Distribution

5.7 Probability Distributions and Variance

3 Stock under the risk-neutral measure

Central Limit Theorem 11/08/2005

[D7] PROBABILITY DISTRIBUTION OF OUTSTANDING LIABILITY FROM INDIVIDUAL PAYMENTS DATA Contributed by T S Wright

sample-bookchapter 2015/7/7 9:44 page 1 #1 THE BINOMIAL MODEL

The Binomial Distribution

Introduction to Probability Theory and Stochastic Processes for Finance Lecture Notes

Martingale Pricing Theory in Discrete-Time and Discrete-Space Models

MBF2263 Portfolio Management. Lecture 8: Risk and Return in Capital Markets

Introduction to Game-Theoretic Probability

March 30, Why do economists (and increasingly, engineers and computer scientists) study auctions?

Math 180A. Lecture 5 Wednesday April 7 th. Geometric distribution. The geometric distribution function is

CS134: Networks Spring Random Variables and Independence. 1.2 Probability Distribution Function (PDF) Number of heads Probability 2 0.

PAULI MURTO, ANDREY ZHUKOV

MATH 5510 Mathematical Models of Financial Derivatives. Topic 1 Risk neutral pricing principles under single-period securities models

Deriving the Black-Scholes Equation and Basic Mathematical Finance

Lecture III. 1. common parametric models 2. model fitting 2a. moment matching 2b. maximum likelihood 3. hypothesis testing 3a. p-values 3b.

Experimental Probability - probability measured by performing an experiment for a number of n trials and recording the number of outcomes

II. Determinants of Asset Demand. Figure 1

Chapter 4 Random Variables & Probability. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

Probability and distributions

What do you think "Binomial" involves?

Risk management. VaR and Expected Shortfall. Christian Groll. VaR and Expected Shortfall Risk management Christian Groll 1 / 56

RATIONAL BUBBLES AND LEARNING

An Introduction to Stochastic Calculus

23.1 Probability Distributions

Week 7. Texas A& M University. Department of Mathematics Texas A& M University, College Station Section 3.2, 3.3 and 3.4

Econ 6900: Statistical Problems. Instructor: Yogesh Uppal

Point Estimation. Some General Concepts of Point Estimation. Example. Estimator quality

Chapter 23: Choice under Risk

Chapter 8. Variables. Copyright 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Chapter 7 1. Random Variables

Midterm Exam III Review

April 29, X ( ) for all. Using to denote a true type and areport,let

Subject CS1 Actuarial Statistics 1 Core Principles. Syllabus. for the 2019 exams. 1 June 2018

2011 Pearson Education, Inc

Martingales. Will Perkins. March 18, 2013

MAKING SENSE OF DATA Essentials series

Remarks on Probability

The proof of Twin Primes Conjecture. Author: Ramón Ruiz Barcelona, Spain August 2014

Intro to GLM Day 2: GLM and Maximum Likelihood

SOCIETY OF ACTUARIES EXAM STAM SHORT-TERM ACTUARIAL MATHEMATICS EXAM STAM SAMPLE QUESTIONS

Expectation Exercises.

Basic Arbitrage Theory KTH Tomas Björk

Transcription:

Prediction Market Prices as Martingales: Theory and Analysis David Klein Statistics 157

Introduction With prediction markets growing in number and in prominence in various domains, the construction of a modeling framework for the behavior of prices on traded contracts has become an increasingly important endeavor. In this paper, we present such a theoretical framework, as we attempt to use martingale theory in the analysis of prediction market price fluctuations. The application of this theory to prediction market prices generates certain predictions regarding, in particular, win probabilities, the distribution of maximum and minimum prices, and the distribution of interval crossings, which we test using empirical data on contract prices for baseball matches from the online prediction marketplace Tradesports. Background For the purposes of this paper, we define a prediction market as a venue at which contracts whose ultimate value depends on the occurrence or failure to occur of some specified event presumably with a limited time horizon are publicly traded. Classic examples of such contracts are those whose value is tied to the event that a specific candidate e.g., Barack Obama becomes president of the United States, or as will be particularly relevant for this paper the event that a sports team wins a given match. From the moment contracts are initially put up for bid by the hosting party until the time at which the contracts pay out, they may be bought and sold by individual traders. In this sense, prediction markets function as an admixture of traditional betting markets and stock markets: Like stock markets and unlike betting markets, prediction market contracts may be sold by individual participants; unlike most stock markets, however, there is a clear termination point for the contract. In general, this paper will assume the Tradesports model: Contracts vary between the arbitrary values of 0 and 100; a contract is initially offered at some value between 0 and 100, and may be traded until the termination point for the contract, at which point its value is either 100 in which case it pays out $10 or zero in which case it pays out nothing. During the trading period for the contract, its value may fluctuate as investor beliefs about the outcome change. In this paper, we concern ourselves principally with these Page 1 of??

price fluctuations; our central tool in the analysis of these movements is an artifact from probability theory known as a martingale. A sequence Y = Y 0,..., Y n is a martingale with respect to a random sequence X = X 0,..., X n if for all n 0 the equality EY n X 0,..., X n 1 = Y n holds. For prediction markets, if we let X be a random sequence of price perturbations, then we assert that if we define Y such that Y n = n i=0 X i, then the price sequence Y is a martingale. This follows from the principle that the price at any given point represents the consensus probability that the event in question will occur, and is thus the fair price for the gamble. Thus, the expectation of the future price based on currently available information will be equal to the current price. One important property of a martingale that follows directly from the definition is the fact that EY n = EY 0 for all n 0. This is easily shown using the tower property of expectation: EY n = EEY n X 0,..., X n 1 = EY n 1 Repeated iteration of this process gives the desired equality. Though this result applies only to a fixed time n, the Optional Stopping Theorem asserts that it can be extended to a random time T given that T is a stopping time. T is defined to be a stopping time if it is decidable whether or not n = T for a given value of n based on the information contained in X 0,..., X n. For example, if we define T to be the time when a gambler first achieves positive profits, then T is a stopping time; if we define T to be the time immediately prior to the gambler s first loss, T is not a stopping time. Formally, the Optional Stopping Theorem states that, for a stopping time T, EY T = EY 0 given that PT < = 1, EY T <, and EY n I {T >n} 0 as n 0. The Optional Stopping Theorem provides the basis for the following equality, from which the key theoretical results of this paper derive. Consider a price x such that Y 0 = x x is the starting price, and prices a and b such that 0 a < x < b 100. Let T be the first time the price reaches either a or b, given that it starts at x. T is clearly a stopping time, and it is intuitively plausible, though we omit the formal proof, that Y T satisfies the conditions of the Optional Stopping Theorem. Thus, EY T = EY 0 = x. Page 2 of??

Additionally, if we define π b to be the probability that the price reaches b before it reaches a, then we have that EY T = 1 π b a + π b b since Y T can take only the values a or b, and it takes the former with probability 1 π b and the latter with probability π b. Setting the two expressions for EY T equal to each other gives x = 1 π b a + π b b x = a + π b b a π b = x a b a 1 It follows that π a = b x b a 2 where π a is the probability that the price reaches a before b. A fundamental entailment of this formula is that if we suppose that the contract pays out if Team 1 wins and fails to pay out if Team 1 loses, then we may evaluate the probabilities that Team 1 wins or, alternately, that Team 2 wins for a given starting price x by setting a = 0 and b = 100. We assume here and in all cases to follow that Team 1 wins if and only if the terminal price of the contract is 100. These probabilities, respectively, are PTeam 1 wins = x 100 3 PTeam 2 wins = 100 x 100 4 Additionally, we can derive certain formulae regarding m Y and M Y, random variables representing the minimum and maximum prices recorded for a given traded contract. Clearly, if m Y a for some given price a, it must be the case that the price of the contract reaches a before it reaches 100. Thus, Pm Y a = 100 x/100 a, which follows from??, with a = a, b = 100. Similarly, using?? with a = 0, b = b, we have that Page 3 of??

PM Y < b = b x/b. Note that if we partition the price sequence Y for a given traded contract into non-overlapping subsequences, these subsequences are martingales as well. We use this observation in conjunction with formulae?? and?? and Bayes Theorem to compute the cumulative distribution function of the minimum conditional on the outcome that Team 1 wins. 100 x Pm Y a Team 1 wins = = a 100 a 100 100 a 100 a + 1 x a 100 a 100 x a100 x x100 a The first equation makes use of the following facts: PTeam 1 wins m Y a = a/100; Pm Y a = 100 x/100 a; PTeam 1 wins m Y > a = 1; and Pm Y > a = x a/100 a. The first, second, and fourth equalities are simple applications of?? and?? with appropriate choices for a, b, and x, while the third follows from the condition that a 0; if the event has terminated and the price has not reached a, then it has not reached zero, and therefore, it must be the case that Team 1 has won. Given these equalities, we arrive at?? via basic algebra. Applying the same approach, we may derive a similar formula with regard to the conditional cumulative distribution function of the maximum price in the case that Team 2 wins: 5 PM Y < b Team 2 wins = 100b x b100 x 6 Another random variable of interest is Z, the number of crossings the price makes of a given interval [a, b]. The price sequence Y crosses [a, b] when it reaches b, having started at a, or vice-versa. For a general interval [a, b], we compute the probability of a single crossing Z = 1 as follows: First, we note that in order for a crossing to occur, it must be the case that the price sequence reaches either a or b. For x [a, b], the probability that a single crossing from a to b occurs is equal to b x a b a b a b 100 a i.e., the probability that the price sequence reaches a before b, reaches b before zero starting from a, and then reaches 100 before reaching a again starting from b, while the probability of a single crossing from b to a is x a 100 b b a b a 100 a b, derived similarly. The probability of a single crossing for x [a, b] is the sum of these two probabilities, since they represent disjoint events. Note that if it is not the case that x [a, b], it must either be true Page 4 of??

that x a or x b; in these cases, a single crossing from b to a or from a to b, respectively, is impossible, and thus PZ = 1 x a = a b a b 100 a, and PZ = 1 x b = 100 b b a 100 a b. For Z 2, the approach is similar; we simply add terms prior to the end term to account for each subsequent crossing. In the case where the interval [a, b] is symmetric about 50, the formula is considerably simpler. Information about the first endpoint of the interval the price sequence reaches is irrelevant, since the price is just as likely to cross up from a to b as it is to cross down from b to a. If we write b = 100 a, it is easily seen that a = 100 b = a. Thus, the general formula for the b 100 a 100 a probability of k crossings given that the price sequence ever enters [a, 100 a] is PZ = k = a k 100 2a 100 a 100 a Note that this is the formula for a shifted geometric distribution with p = 100 2a/100 a. 7 Methodology and Results As the foregoing analysis makes clear, the presumption that prediction market prices may be described as martingales generates a number of predictions that we may test empirically. To this end, we collected price data on Tradesports contracts for 91 baseball games played between August 7 and October 27, 2008. For each such game, data consisted of the price sequence from the opening bid price the starting price until the price at termination either 100 or 0. We used these data to assess the accuracy of the three main theoretical predictions described above, namely: 1 The starting price reflects the probability that a given team will ultimately prevail; 2 The conditional distributions of the minimum and maximum are those given in?? and??, respectively; and 3 The distribution of the number of crossings of an interval that is symmetric about 50 is given by??. For the purposes of testing these predictions, it is clearly desirable that we may treat the games in the data set as independent, identically distributed realizations of a particular random variable. While the assumption of inde- Page 5 of??

pendence is not difficult to justify, the identical distribution condition poses a slight problem. In particular, the formulae which generate predictions 1 and 2 depend on x, the starting price, which may vary from game to game. Thus, if we consider the achievement of a given minimum or the failure to achieve a given minimum, for example, as a random indicator variable, our data set is like a series of coin flips where the coins may have different values for p. Thus, it was necessary to adopt strategies to standardize p. For the purposes of testing prediction 1, games were grouped according to starting price; all games whose starting price was within a given range e.g., 50 x < 60 were placed in the same group, and all groups of equal size with sufficiently many i.e., more than 10 games were tested. We created two separate partitions by starting price one had price groups [50, 60 and [60, 70 each of which contained 39 games, while the other had groups [50, 55, [55, 60, and [60, 65. The groups in the second partition contained 19, 20, and 31 games, respectively. For each group in a given partition, the mean starting price was computed. This mean price divided by 100 was taken to be the success probability p for a series of Bernoulli trials success in this case is the event that Team 1 wins. Thus, the number of games won by Team 1 in the group as a whole was considered to be a binomially distributed random variable. Using the binomial distribution, we were able to compute the endpoints of the critical interval that is, the interval in which 95 percent of values would be expected to fall for each price group. The critical intervals for the two groups in the first partition were [15, 27] and [18, 30], respectively, while the critical intervals for the three groups in the second partition were [6, 14], [7, 16] and [14, 24]. The observed values of the number of victories by Team 1 in each group were 23 and 24 for the first partition, and 10, 13, and 19 for the second. See Figs. 1 and 2 in the Appendix for a visual representation of this data. Thus, all critical intervals contained the sample estimates for the parameter, and thus there is no strong evidence to reject the null hypothesis that prediction market prices may be modeled as martingales based on this criterion. With regard to the conditional minima and maxima, we chose to consider all contracts that passed through the value 50. The subsequence beginning at 50 is itself a martingale, and so we take x = 50 to be the starting price for each contract. The choice of 50 was arbitrary, based primarily on simplicity Page 6 of??

and symmetry: Each team won half of the 64 games whose price reached 50 at some point. The martingale theory described above is presumed to apply equally well to these truncated trading periods. Thus, substituting 50 for x in?? and??, we have that Pm Y a Team 1 wins = a 100 a and PM Y < b Team 2 wins = 2b 50. Using these probabilities and the b binomial formula as above, we were able to construct the critical intervals for the minimum value 40 conditional on victory by Team 1, and, respectively, for the maximum value 60 conditional on victory by Team 2. These intervals were then compared with the actual number of games won by Team 1 respectively, Team 2 whose minimum maximum price was below 40 60. These intervals were [16, 26] and [6, 16]. The number of games won by Team 1 in which the minimum price reached after 50 was below 40, 21, was equal to the number of games won by Team 2 in which the maximum price achieved after 50 was 21; while this number is contained in the critical interval for the minimum, it is beyond the range of the critical interval for the maximum. In fact, under the null hypothesis that the maximum probability is as given in??, the likelihood of getting a sample of 32 games in which 21 or more had a post-50 maximum price less than 60 was virtually zero. This result thus casts serious doubt on whether prediction market prices may in fact be modeled as martingales in the manner described above. Additionally, for each price less than or equal to 50, we tallied the number of games whose post-50 minimum price was less than or equal to the given price. In this way, we were able to generate the empirical cdf for the minimum. A graph of the empirical and theoretical distribution functions see Fig. 3 shows a high degree of consonance, and suggests that the martingale model describes such minimum prices quite well. The results are not so agreeable, however, for the empirical cdf of the maximum: The observed number of games where the post-50 maximum price is less than a given price is consistently higher than the predicted number of such games. This is driven in particular by the fact that 14 of the 32 games won by Team 2 after the price reached 50 never reached a price above 50 after hitting 50 for the first time. Finally, we examine the difference between the observed and expected numbers of crossings of a given symmetric interval. This requires no fix to omit an x from the relevant formula, since a is the only parameter in the Page 7 of??

expression. We arbitrarily selected this a to be 40, which gives the interval [40, 60]. Eighty-two of the 91 games contained a point in this interval and were thus suitable for analysis. Using??, for which p = 1/3 for a = 40, we computed the vector of expected crossings to be approximately 27, 18, 12, 8, 16 for 0, 1, 2, 3, and 4 or more crossings, respectively. Note that the sum of the elements in the vector is only 81 due to rounding error. The vector of observed crossings was tabulated to be 38, 22, 13, 6, 1, 2. With this data, we administered a chi-square goodness of fit test comparing the observed and expected counts to test the hypothesis that the shifted geometric distribution with p = 1/3 in fact describes the data. The p-value for this test was 0.0026, which implies that the proposed distribution is a bad fit for the data. In particular, it predicts many fewer games with zero crossings and many more with four or more than were in fact observed. Discussion The results of this analysis are mixed. At a basic level, it appears that the starting contract price is a fairly accurate predictor of the likelihood that the event will in fact occur. However, predictions regarding the conditional maximum price for a given contract are not supported by these data, nor are those concerning the number of crossings of an interval. In particular, for the markets analyzed in these data, it appears that there are fewer large fluctuations than one would expect using martingale-based theory. We note additionally that the data set contained a disproportionately large number of games 84 of 91 whose starting price was greater than 50. Thus, if it is the case that contracts tend to follow a given trend line more closely than the theory implies, the failure of the theoretical predictions regarding maximum prices may possibly be due to the large number of games that started above 50 and drifted down to zero in a fairly consistent manner. Obviously, it is not clear from this analysis whether this is specific to baseball matches or Tradesports or whether it applies to prediction markets in general, and thus it remains undecided whether martingales may actually be used to generate useful predictions for prediction market price movements. Page 8 of??

Appendix: Graphs Starting Price 50 55 55 60 Starting Price 50 55 55 60 10 15 20 25 30 35 Number of Games Won by Team 1 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Games Won by Team 1 Figure 1: Observed counts left and proportions right of games won by Team 1 for starting price groups [50, 60, [60, 70. Note that the parentheses mark the critical interval. Starting Price 50 55 55 60 60 65 Starting Price 0 5 10 15 20 25 30 Number of Games Won by Team 1 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Games Won by Team 1 Figure 2: Observed counts left and proportions right of games won by Team 1 for starting price groups [50, 55, [55, 60, and [60, 65 and their critical intervals. Page 9 of??

Theoretical / Empirical Proportion 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 50 Minimum Price Figure 3: Empirical/theoretical cdf for the minimum post-50 price conditional on a victory by Team 1. The points are the observed values, while the curve represents the theoretical values. Theoretical / Empirical Proportion 0.0 0.2 0.4 0.6 0.8 1.0 50 60 70 80 90 100 Maximum Price Figure 4: Empirical/theoretical cdf for the maximum post-50 price conditional on a victory by Team 2. Page 10 of??

Minimum 40 Maximum 60 0 5 10 15 20 25 30 Number of Games With Minimum Maximum Post 50 Price Less Than 40 60 in Games Won by Team 1 Team 2 Figure 5: Observed counts of numbers of games won by Team 1 Team 2, respectively in which the minimum maximum price reached after 50 was below 40 60 and critical intervals. Page 11 of??