A Learning Theory of Ranking Aggregation

Similar documents
Controlling the distance to the Kemeny consensus without computing it

For every job, the start time on machine j+1 is greater than or equal to the completion time on machine j.

Lossy compression of permutations

Lie Algebras and Representation Theory Homework 7

Yao s Minimax Principle

The assignment game: Decentralized dynamics, rate of convergence, and equitable core selection

E&G, Ch. 8: Multi-Index Models & Grouping Techniques I. Multi-Index Models.

Techniques for Calculating the Efficient Frontier

When Do Noisy Votes Reveal the Truth?

Comparing Partial Rankings

THE TRAVELING SALESMAN PROBLEM FOR MOVING POINTS ON A LINE

More Advanced Single Machine Models. University at Buffalo IE661 Scheduling Theory 1

Consumption and Asset Pricing

Hardy Weinberg Model- 6 Genotypes

When Do Noisy Votes Reveal the Truth?

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern

P VaR0.01 (X) > 2 VaR 0.01 (X). (10 p) Problem 4

Monte-Carlo Planning: Introduction and Bandit Basics. Alan Fern

2.1 Mathematical Basis: Risk-Neutral Pricing

IEOR E4004: Introduction to OR: Deterministic Models

Reliable region predictions for Automated Valuation Models

Optimum Thresholding for Semimartingales with Lévy Jumps under the mean-square error

The Value of Stochastic Modeling in Two-Stage Stochastic Programs

Lecture 4. Finite difference and finite element methods

3.4 Copula approach for modeling default dependency. Two aspects of modeling the default times of several obligors

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMSN50)

Cost Sharing in a Job Scheduling Problem

Final Exam Suggested Solutions

DRAFT. 1 exercise in state (S, t), π(s, t) = 0 do not exercise in state (S, t) Review of the Risk Neutral Stock Dynamics

Ultra High Frequency Volatility Estimation with Market Microstructure Noise. Yacine Aït-Sahalia. Per A. Mykland. Lan Zhang

TTIC An Introduction to the Theory of Machine Learning. Learning and Game Theory. Avrim Blum 5/7/18, 5/9/18

2. This algorithm does not solve the problem of finding a maximum cardinality set of non-overlapping intervals. Consider the following intervals:

Midterm Exam. b. What are the continuously compounded returns for the two stocks?

Importance sampling and Monte Carlo-based calibration for time-changed Lévy processes

Lecture 23. STAT 225 Introduction to Probability Models April 4, Whitney Huang Purdue University. Normal approximation to Binomial

Large-Scale SVM Optimization: Taking a Machine Learning Perspective

Copyright 2005 Pearson Education, Inc. Slide 6-1

Parametric Inference and Dynamic State Recovery from Option Panels. Nicola Fusari

Chapter 7. Sampling Distributions and the Central Limit Theorem

Chair of Communications Theory, Prof. Dr.-Ing. E. Jorswieck. Übung 5: Supermodular Games

ON A PROBLEM BY SCHWEIZER AND SKLAR

Applications of Linear Programming

Optimal Dividend Policy of A Large Insurance Company with Solvency Constraints. Zongxia Liang

Implied Systemic Risk Index (work in progress, still at an early stage)

AN ALGORITHM FOR FINDING SHORTEST ROUTES FROM ALL SOURCE NODES TO A GIVEN DESTINATION IN GENERAL NETWORKS*

OPTIMAL PORTFOLIO CONTROL WITH TRADING STRATEGIES OF FINITE

Inversion Formulae on Permutations Avoiding 321

The Assignment Problem

Limit Theorems for the Empirical Distribution Function of Scaled Increments of Itô Semimartingales at high frequencies

Stochastic Proximal Algorithms with Applications to Online Image Recovery

On Complexity of Multistage Stochastic Programs

Eco 504, Macroeconomic Theory II Final exam, Part 1, Monetary Theory and Policy, with Solutions

Equity correlations implied by index options: estimation and model uncertainty analysis

Preliminary Notions in Game Theory

Kutay Cingiz, János Flesch, P. Jean-Jacques Herings, Arkadi Predtetchinski. Doing It Now, Later, or Never RM/15/022

Counting Basics. Venn diagrams

Pricing and hedging in incomplete markets

Elif Özge Özdamar T Reinforcement Learning - Theory and Applications February 14, 2006

Essays on Some Combinatorial Optimization Problems with Interval Data

Optimization Models in Financial Mathematics

Choosing to Rank. Stephen Ragain, Johan Ugander Management Science and Engineering Stanford University

Answer Key for M. A. Economics Entrance Examination 2017 (Main version)

REVERSE-ENGINEERING COUNTRY RISK RATINGS: A COMBINATORIAL NON-RECURSIVE MODEL. Peter L. Hammer Alexander Kogan Miguel A. Lejeune

Methods and Models of Loss Reserving Based on Run Off Triangles: A Unifying Survey

Chapter 7. Sampling Distributions and the Central Limit Theorem

Martingales. by D. Cox December 2, 2009

Kernel Conditional Quantile Estimation via Reduction Revisited

Discrete time interest rate models

A random variable (r. v.) is a variable whose value is a numerical outcome of a random phenomenon.

Orthogonality to the value group is the same as generic stability in C-minimal expansions of ACVF

arxiv: v1 [q-fin.rm] 20 Jan 2011

Chapter 4: Asymptotic Properties of MLE (Part 3)

Homework Assignments

A Property Equivalent to n-permutability for Infinite Groups

Practice Problems 1: Moral Hazard

Financial Linkages, Portfolio Choice and Systemic Risk

ADDING A LOT OF COHEN REALS BY ADDING A FEW II. 1. Introduction

Sublinear Time Algorithms Oct 19, Lecture 1

Understanding Deep Learning Requires Rethinking Generalization

Performance Measurement and Best Practice Benchmarking of Mutual Funds:

SYLLABUS AND SAMPLE QUESTIONS FOR MSQE (Program Code: MQEK and MQED) Syllabus for PEA (Mathematics), 2013

BMI/CS 776 Lecture #15: Multiple Alignment - ProbCons. Colin Dewey

Noureddine Kouaissah, Sergio Ortobelli, Tomas Tichy University of Bergamo, Italy and VŠB-Technical University of Ostrava, Czech Republic

MS-E2114 Investment Science Lecture 5: Mean-variance portfolio theory

Financial Economics 4: Portfolio Theory

MACROECONOMICS. Prelim Exam

X i = 124 MARTINGALES

On Sensitivity Value of Pair-Matched Observational Studies

PORTFOLIO MODELLING USING THE THEORY OF COPULA IN LATVIAN AND AMERICAN EQUITY MARKET

Lecture 23: April 10

ERM (Part 1) Measurement and Modeling of Depedencies in Economic Capital. PAK Study Manual

Optimal Portfolio Inputs: Various Methods

SYLLABUS AND SAMPLE QUESTIONS FOR MS(QE) Syllabus for ME I (Mathematics), 2012

REORDERING AN EXISTING QUEUE

Journal of Computational and Applied Mathematics. The mean-absolute deviation portfolio selection problem with interval-valued returns

Basic Procedure for Histograms

A relation on 132-avoiding permutation patterns

IEOR 130 Review. Methods for Manufacturing Improvement. Prof. Robert C. Leachman University of California at Berkeley.

Implementing the HJM model by Monte Carlo Simulation

Transcription:

A Learning Theory of Ranking Aggregation France/Japan Machine Learning Workshop Anna Korba, Stephan Clémençon, Eric Sibony November 14, 2017 Télécom ParisTech

Outline 1. The Ranking Aggregation Problem 2. Statistical Framework 3. Results 1

The Ranking Aggregation Problem

The Ranking Aggregation Problem Framework n items: {1,..., n}. N rankings on the n items (from most preferred to last): i 1 i 2 i n. i 1 i n permutation σ on {1,..., n} s.t. σ(i j ) = j. 2

The Ranking Aggregation Problem Framework n items: {1,..., n}. N rankings on the n items (from most preferred to last): i 1 i 2 i n. i 1 i n permutation σ on {1,..., n} s.t. σ(i j ) = j. 2

The Ranking Aggregation Problem Framework n items: {1,..., n}. N rankings on the n items (from most preferred to last): i 1 i 2 i n. i 1 i n permutation σ on {1,..., n} s.t. σ(i j ) = j. Suppose we have a dataset of rankings/permutations (σ 1,..., σ N ) S N n. 2

The Ranking Aggregation Problem Framework n items: {1,..., n}. N rankings on the n items (from most preferred to last): i 1 i 2 i n. i 1 i n permutation σ on {1,..., n} s.t. σ(i j ) = j. Suppose we have a dataset of rankings/permutations (σ 1,..., σ N ) S N n. Consensus Ranking We want to find a global order ( consensus ) σ on the n items that best represents the dataset. 2

Methods for Ranking Aggregation Copeland Rule. Sort the items according to their Copeland score, defined for each item i by: s C (i) = 1 N n I[σ t (i) < σ t (j)] N t=1 j=1 j i 3

Methods for Ranking Aggregation Copeland Rule. Sort the items according to their Copeland score, defined for each item i by: s C (i) = 1 N n I[σ t (i) < σ t (j)] N Kemeny s rule (1959). Find the solution of : t=1 min σ Sn j=1 j i N i=1 where d is the Kendall s tau distance: d τ (σ, σ ) = i<j d(σ, σ t ) (1) I{(σ(i) σ(j))(σ (i) σ (j)) < 0}, Kemeny s consensus has a lot of interesting properties, but it is NP-hard to compute, even for n = 4 (see Dwork et al., 2001). 3

Statistical Framework

Statistical Reformulation Suppose the dataset is composed of N i.i.d. copies Σ 1,..., Σ N of a r.v. Σ P. A (Kemeny) median of P w.r.t. d is solution of: min E Σ P [d(σ, σ)], σ S n where L(σ) = E Σ P [d(σ, σ)] is the risk of σ. 4

Statistical Reformulation Suppose the dataset is composed of N i.i.d. copies Σ 1,..., Σ N of a r.v. Σ P. A (Kemeny) median of P w.r.t. d is solution of: min E Σ P [d(σ, σ)], σ S n where L(σ) = E Σ P [d(σ, σ)] is the risk of σ. Let L N (σ) = 1 N N t=1 d(σ t, σ). 4

Statistical Reformulation Suppose the dataset is composed of N i.i.d. copies Σ 1,..., Σ N of a r.v. Σ P. A (Kemeny) median of P w.r.t. d is solution of: min E Σ P [d(σ, σ)], σ S n where L(σ) = E Σ P [d(σ, σ)] is the risk of σ. Let L N (σ) = 1 N N t=1 d(σ t, σ). Goal of our analysis: Study the performance of Kemeny empirical medians, i.e. solutions σ N of: min σ S n LN (σ), through the excess of risk L( σ N ) L. We establish links with Copeland method. 4

Risk of Ranking Aggregation The risk of a median σ is L(σ) = E Σ P [d(σ, σ)], where d is the Kendall s tau distance: d(σ, σ ) = {(σ(i) σ(j))(σ (i) σ (j)) < 0} {i,j} n Let p i,j = P[Σ(i) < Σ(j)] the probability that item i is preferred to item j. 5

Risk of Ranking Aggregation The risk of a median σ is L(σ) = E Σ P [d(σ, σ)], where d is the Kendall s tau distance: d(σ, σ ) = {(σ(i) σ(j))(σ (i) σ (j)) < 0} {i,j} n Let p i,j = P[Σ(i) < Σ(j)] the probability that item i is preferred to item j. The risk can be rewritten: L(σ) = i<j p i,j I{σ(i) > σ(j)} + i<j (1 p i,j )I{σ(i) < σ(j)}. 5

Risk of Ranking Aggregation The risk of a median σ is L(σ) = E Σ P [d(σ, σ)], where d is the Kendall s tau distance: d(σ, σ ) = {(σ(i) σ(j))(σ (i) σ (j)) < 0} {i,j} n Let p i,j = P[Σ(i) < Σ(j)] the probability that item i is preferred to item j. The risk can be rewritten: L(σ) = i<j p i,j I{σ(i) > σ(j)} + i<j (1 p i,j )I{σ(i) < σ(j)}. So if there exists a permutation σ verifying: i < j s.t. p i,j 1/2, it would be necessary of median for P. (σ(j) σ(i)) (p i,j 1/2) > 0, (2) 5

Preference cycles p 1,2 > 1/2 1 p 3,1 > 1/2 2 3 p 2,3 > 1/2 6

Preference cycles p 1,2 > 1/2 1 p 3,1 > 1/2 2 3 p 2,3 > 1/2 No permutation can satisfy this condition for each pair of items! 6

Preference cycles p 1,2 > 1/2 1 p 3,1 > 1/2 2 3 p 2,3 > 1/2 No permutation can satisfy this condition for each pair of items! Definition P on S n is stochastically transitive if : (i, j, k) n 3, p i,j 1/2 and p j,k 1/2 p i,k 1/2. Moreover, if p i,j 1/2 for all i < j, P is strictly stochastically transitive. 6

Results

Optimality Theorem If P is stochastically transitive, there exists σ S n verifying: i < j s.t. p i,j 1/2, is verified. (σ(j) σ(i)) (p i,j 1/2) > 0, The Copeland score of an item i, that is: s (i) = 1 + k i I{p i,k < 1 2 } defines a permutation s S n and is the unique median of P iff P is strictly stochastically transitive. 7

Optimality Theorem If P is stochastically transitive, there exists σ S n verifying: i < j s.t. p i,j 1/2, is verified. (σ(j) σ(i)) (p i,j 1/2) > 0, The Copeland score of an item i, that is: s (i) = 1 + k i I{p i,k < 1 2 } defines a permutation s S n and is the unique median of P iff P is strictly stochastically transitive. 7

Universal Rates The excess risk of σ N is upper bounded: (i) In expectation by E [L( σ N ) L n(n 1) ] 2 N (ii) With probability higher than 1 δ for any δ (0, 1) by L( σ N ) L n(n 1) 2 log(n(n 1)/δ). 2 N 8

Conditions for Fast Rates Suppose that P verifies: the Stochastic Transitivity condition: p i,j 1/2 and p j,k 1/2 p i,k 1/2. the Low-Noise condition NA(h) for some h > 0: min p i,j 1/2 h. i<j 9

Conditions for Fast Rates Suppose that P verifies: the Stochastic Transitivity condition: p i,j 1/2 and p j,k 1/2 p i,k 1/2. the Low-Noise condition NA(h) for some h > 0: min p i,j 1/2 h. i<j Introduced for binary classification (see Koltchinskii and Beznosova, 2005). Used for estimation of matrix of pairwise probabilities (see Shah et al., 2016). 9

Fast rates Let α h = 1 2 log ( 1/(1 4h 2 ) ). Assume that P satisfies the previous conditions. (i) For any empirical Kemeny median σ N, we have: E [L( σ N ) L ] n2 (n 1) 2 e αhn. 8 (ii) With probability at least 1 (n(n 1)/4)e αhn, the empirical Copeland score ŝ N (i) = 1 + k i I{ p i,k < 1 2 } for 1 i n belongs to S n and is the unique solution of Kemeny empirical minimization. In practice: under the needed conditions, Copeland method (O(N ( n 2) )) outputs the Kemeny consensus (NP-hard) with high prob. 10

Conclusion and future directions We introduced a general statistical framework for ranking aggregation and established rates of convergence. 11

Conclusion and future directions We introduced a general statistical framework for ranking aggregation and established rates of convergence. We can write the empirical risk when one observe pairwise comparisons only instead of full rankings 11

Conclusion and future directions We introduced a general statistical framework for ranking aggregation and established rates of convergence. We can write the empirical risk when one observe pairwise comparisons only instead of full rankings Conditions for fast rates have good chance to be satisfied in a homogeneous population so Copeland method could be useful for local methods 11

Conclusion and future directions We introduced a general statistical framework for ranking aggregation and established rates of convergence. We can write the empirical risk when one observe pairwise comparisons only instead of full rankings Conditions for fast rates have good chance to be satisfied in a homogeneous population so Copeland method could be useful for local methods 11