Graph signal processing for clustering

Similar documents
Monte Carlo Methods for Uncertainty Quantification

Using radial basis functions for option pricing

Market Risk Analysis Volume I

9.1 Principal Component Analysis for Portfolios

Machine Learning for Quantitative Finance

Supplementary Material for Combinatorial Partial Monitoring Game with Linear Feedback and Its Application. A. Full proof for Theorems 4.1 and 4.

Large-Scale SVM Optimization: Taking a Machine Learning Perspective

STATISTICAL ANALYSIS OF HIGH FREQUENCY FINANCIAL TIME SERIES: INDIVIDUAL AND COLLECTIVE STOCK DYNAMICS

Interpretive Structural Modeling of Interactive Risks

A Highly Efficient Shannon Wavelet Inverse Fourier Technique for Pricing European Options

The data-driven COS method

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Lattices from equiangular tight frames with applications to lattice sparse recovery

EE/AA 578 Univ. of Washington, Fall Homework 8

2007 ASTIN Colloquium Call For Papers. Using Interpretive Structural Modeling to Identify and Quantify Interactive Risks

The data-driven COS method

Energy Systems under Uncertainty: Modeling and Computations

arxiv: v1 [math.st] 18 Sep 2018

Algebraic Problems in Graphical Modeling

Investing through Economic Cycles with Ensemble Machine Learning Algorithms

The University of Sydney School of Mathematics and Statistics. Computer Project

The assignment game: Decentralized dynamics, rate of convergence, and equitable core selection

Exercise List: Proving convergence of the (Stochastic) Gradient Descent Method for the Least Squares Problem.

A Non-Normal Principal Components Model for Security Returns

CMPSCI 311: Introduction to Algorithms Second Midterm Practice Exam SOLUTIONS

A way to improve incremental 2-norm condition estimation

A model reduction approach to numerical inversion for parabolic partial differential equations

Design of a Financial Application Driven Multivariate Gaussian Random Number Generator for an FPGA

Online Appendices, Not for Publication

Trust Region Methods for Unconstrained Optimisation

On multivariate Multi-Resolution Analysis, using generalized (non homogeneous) polyharmonic splines. or: A way for deriving RBF and associated MRA

Review of Global Industry Classification

Nonlinear Manifold Learning for Financial Markets Integration

Higher Order Freeness: A Survey. Roland Speicher Queen s University Kingston, Canada

Order book resilience, price manipulations, and the positive portfolio problem

Square-Root Measurement for Ternary Coherent State Signal

ECS171: Machine Learning

Lattice based cryptography

A model reduction approach to numerical inversion for parabolic partial differential equations

Rough Heston models: Pricing, hedging and microstructural foundations

BARUCH COLLEGE MATH 2003 SPRING 2006 MANUAL FOR THE UNIFORM FINAL EXAMINATION

Risk control of mean-reversion time in. statistical arbitrage

A Hybrid Commodity and Interest Rate Market Model

Course information FN3142 Quantitative finance

Structural similarities between Input-Output tables: a comparison of OECD economies.

RISK-NEUTRAL VALUATION AND STATE SPACE FRAMEWORK. JEL Codes: C51, C61, C63, and G13

Barrier Option. 2 of 33 3/13/2014

Network Structure and the Aggregation of Information: Theory and Evidence from Indonesia

2.1 Mathematical Basis: Risk-Neutral Pricing

Dependence Structure and Extreme Comovements in International Equity and Bond Markets

2D penalized spline (continuous-by-continuous interaction)

Chapter 5 Finite Difference Methods. Math6911 W07, HM Zhu

IMPA Commodities Course : Forward Price Models

Sublinear Time Algorithms Oct 19, Lecture 1

Sparse Representations

A Hybrid Commodity and Interest Rate Market Model

Higher Order Freeness: A Survey. Roland Speicher Queen s University Kingston, Canada

Pricing Financial Derivatives with Multi-Task Machine Learning and Mixed Effects Models

Monte Carlo Methods in Finance

Techniques for Calculating the Efficient Frontier

Supplemental Online Appendix to Han and Hong, Understanding In-House Transactions in the Real Estate Brokerage Industry

A potentially useful approach to model nonlinearities in time series is to assume different behavior (structural break) in different subsamples

Time Series Analysis. Prof. R. Kruse, Chr. Braune Intelligent Data Analysis 1

A GENERAL FORMULA FOR OPTION PRICES IN A STOCHASTIC VOLATILITY MODEL. Stephen Chin and Daniel Dufresne. Centre for Actuarial Studies

Log-Robust Portfolio Management

Multivariate Cox PH model with log-skew-normal frailties

Using condition numbers to assess numerical quality in HPC applications

Write legibly. Unreadable answers are worthless.

Convergence of trust-region methods based on probabilistic models

arxiv: v2 [q-fin.cp] 22 Jun 2017

Implementing Candidate Graded Encoding Schemes from Ideal Lattices

12 The Bootstrap and why it works

Market Risk Analysis Volume II. Practical Financial Econometrics

Cross-Section Performance Reversion

Genetics and/of basket options

symmys.com 3.2 Projection of the invariants to the investment horizon

RESEARCH ARTICLE. The Penalized Biclustering Model And Related Algorithms Supplemental Online Material

DASL a b. Benjamin Reish. Supplement to. Oklahoma State University. Stillwater, OK

COSC 311: ALGORITHMS HW4: NETWORK FLOW

Robustness, Canalyzing Functions and Systems Design

Risk profile clustering strategy in portfolio diversification

Chapter 8: CAPM. 1. Single Index Model. 2. Adding a Riskless Asset. 3. The Capital Market Line 4. CAPM. 5. The One-Fund Theorem

High-Frequency Data Analysis and Market Microstructure [Tsay (2005), chapter 5]

Cumulants and triangles in Erdős-Rényi random graphs

Equity correlations implied by index options: estimation and model uncertainty analysis

Premia 14 HESTON MODEL CALIBRATION USING VARIANCE SWAPS PRICES

CS134: Networks Spring Random Variables and Independence. 1.2 Probability Distribution Function (PDF) Number of heads Probability 2 0.

Parametric Inference and Dynamic State Recovery from Option Panels. Nicola Fusari

Bayesian Finance. Christa Cuchiero, Irene Klein, Josef Teichmann. Obergurgl 2017

Table of Contents. Kocaeli University Computer Engineering Department 2011 Spring Mustafa KIYAR Optimization Theory

Computational Statistics Handbook with MATLAB

Lecture 6. 1 Polynomial-time algorithms for the global min-cut problem

Fact: The graph of a rational function p(x)/q(x) (in reduced terms) will be have no jumps except at the zeros of q(x), where it shoots off to ±.

MAKING OPTIMISATION TECHNIQUES ROBUST WITH AGNOSTIC RISK PARITY

Application of an Interval Backward Finite Difference Method for Solving the One-Dimensional Heat Conduction Problem

Rosario Nunzio Mantegna

SPECTRAL ANALYSIS OF STOCK-RETURN VOLATILITY, CORRELATION, AND BETA

The Probabilistic Method - Probabilistic Techniques. Lecture 7: Martingales

Tangent Lévy Models. Sergey Nadtochiy (joint work with René Carmona) Oxford-Man Institute of Quantitative Finance University of Oxford.

UNIT 2. Greedy Method GENERAL METHOD

Transcription:

Graph signal processing for clustering Nicolas Tremblay PANAMA Team, INRIA Rennes with Rémi Gribonval, Signal Processing Laboratory 2, EPFL, Lausanne with Pierre Vandergheynst.

What s clustering? N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 1 / 26

Given a series of N objects : N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 2 / 26

Given a series of N objects : 1/ Find adapted descriptors N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 2 / 26

Given a series of N objects : 1/ Find adapted descriptors 2/ Cluster N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 2 / 26

After step 1, one has : N vectors in d dimensions (descriptor dimension) : x 1, x 2,, x N R d N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 3 / 26

After step 1, one has : N vectors in d dimensions (descriptor dimension) : x 1, x 2,, x N R d and their distance matrix D R N N. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 3 / 26

After step 1, one has : N vectors in d dimensions (descriptor dimension) : x 1, x 2,, x N R d and their distance matrix D R N N. The goal of clustering is to assign a label c(i) = 1,, k to each object i in order to organize / simplify / analyze the data. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 3 / 26

After step 1, one has : N vectors in d dimensions (descriptor dimension) : x 1, x 2,, x N R d and their distance matrix D R N N. The goal of clustering is to assign a label c(i) = 1,, k to each object i in order to organize / simplify / analyze the data. There exists two different general types of methods : methods directly based on the x i and/or D like k-means or hierarchical clustering. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 3 / 26

After step 1, one has : N vectors in d dimensions (descriptor dimension) : x 1, x 2,, x N R d and their distance matrix D R N N. The goal of clustering is to assign a label c(i) = 1,, k to each object i in order to organize / simplify / analyze the data. There exists two different general types of methods : methods directly based on the x i and/or D like k-means or hierarchical clustering. graph-based methods. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 3 / 26

Graph construction from the distance matrix D Create a graph G = (V, E) : N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 4 / 26

Graph construction from the distance matrix D Create a graph G = (V, E) : each node in V is one of the N objects N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 4 / 26

Graph construction from the distance matrix D Create a graph G = (V, E) : each node in V is one of the N objects each pair of nodes (i, j) is connected if the associated distance D(i, j) is small enough. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 4 / 26

Graph construction from the distance matrix D Create a graph G = (V, E) : each node in V is one of the N objects each pair of nodes (i, j) is connected if the associated distance D(i, j) is small enough. For example, two connectivity possibilities : Gaussian kernel : 1. all pairs of nodes are connected with links of weights exp( D(i, j)/σ) 2. remove all links of weight inferior to ɛ N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 4 / 26

Graph construction from the distance matrix D Create a graph G = (V, E) : each node in V is one of the N objects each pair of nodes (i, j) is connected if the associated distance D(i, j) is small enough. For example, two connectivity possibilities : Gaussian kernel : 1. all pairs of nodes are connected with links of weights exp( D(i, j)/σ) 2. remove all links of weight inferior to ɛ k nearest neighbors : connect each node to its k nearest neighbors. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 4 / 26

The problem now states : Given the graph G representing the similarity between the N objects, find a partition of all nodes in k clusters. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 5 / 26

The problem now states : Given the graph G representing the similarity between the N objects, find a partition of all nodes in k clusters. Many methods exist [Fortunato 10] : Modularity (or other cost-function) optimisation methods [Newman 04] Random walk methods [Delvenne 10] Methods inspired from statistical physics [Krzakala 12], information theory [Rosvall 07]... spectral methods... N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 5 / 26

Three useful matrices The adjacency matrix : The degree matrix : 0 1 1 0 2 0 0 0 W = 1 0 1 1 1 1 0 0 S = 0 3 0 0 0 0 2 0 0 1 0 0 0 0 0 1 The Laplacian matrix : 2 1 1 0 L = S W = 1 3 1 1 1 1 2 0 0 1 0 1 N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 6 / 26

Three useful matrices The adjacency matrix : The degree matrix : 0.5.5 0 1 0 0 0 W =.5 0.5 4.5.5 0 0 S = 0 5 0 0 0 0 1 0 0 4 0 0 0 0 0 4 The Laplacian matrix : 1.5.5 0 L = S W =.5 5.5 4.5.5 1 0 0 4 0 4 N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 6 / 26

The classical spectral clustering algorithm [Von Luxburg 06] : Given the N-node graph G of adjacency matrix W : 1. Compute : U k = (u 1 u 2 u k ) the first k eigenvectors of L = S W. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 7 / 26

The classical spectral clustering algorithm [Von Luxburg 06] : Given the N-node graph G of adjacency matrix W : 1. Compute : U k = (u 1 u 2 u k ) the first k eigenvectors of L = S W. 2. Consider each node i as a point in R k : f i = U k δ i. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 7 / 26

The classical spectral clustering algorithm [Von Luxburg 06] : Given the N-node graph G of adjacency matrix W : 1. Compute : U k = (u 1 u 2 u k ) the first k eigenvectors of L = S W. 2. Consider each node i as a point in R k : f i = U k δ i. 3. Run k-means with the Euclidean distance : D ij = f i f j and obtain k clusters. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 7 / 26

What s the point of using a graph? N points in d = 2 dimensions. Result with k-means (k=2) : After creating a graph from the N points interdistances, and running the spectral clustering algorithm (with k=2) : N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 8 / 26

Computation bottlenecks of the spectral clustering algorithm When N and/or k become too large, there are two main bottlenecks in the algorithm : 1. The partial eigendecomposition of the Laplacian. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 9 / 26

Computation bottlenecks of the spectral clustering algorithm When N and/or k become too large, there are two main bottlenecks in the algorithm : 1. The partial eigendecomposition of the Laplacian. 2. k-means. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 9 / 26

Computation bottlenecks of the spectral clustering algorithm When N and/or k become too large, there are two main bottlenecks in the algorithm : 1. The partial eigendecomposition of the Laplacian. 2. k-means. Our goal : Circumvent both! N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 9 / 26

What s graph signal processing? N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 10 / 26

What s a graph signal? 40 20 C 0 20 J F M A M J J A S O N D 15 C 10 5 0 5 10 15 20 Heure N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 11 / 26

What s a graph signal? 40 20 C 0 20 J F M A M J J A S O N D 15 C 10 5 0 5 10 15 20 Heure N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 11 / 26

What s a graph signal? 40 20 C 0 20 J F M A M J J A S O N D 15 C 10 5 0 5 10 15 20 Heure N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 11 / 26

What s a graph signal? 40 20 C 0 20 J F M A M J J A S O N D 15 C 10 5 0 5 10 15 20 Heure N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 11 / 26

What s a graph signal? 40 20 C 0 20 J F M A M J J A S O N D 15 C 10 5 0 5 10 15 20 Heure N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 11 / 26

What s a graph signal? 40 20 C 0 20 J F M A M J J A S O N D 15 C 10 5 0 5 10 15 20 Heure N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 11 / 26

What s the graph Fourier matrix? [Hammond 11] The classical graph : 2 1 0 0 1 1 2 1 0 0 L cl =..... 0 0 0 2 1 1 0 0 1 2 All classical Fourier modes are the eigenvectors of L cl N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 12 / 26

What s the graph Fourier matrix? [Hammond 11] The classical graph : Any graph : 2 1 0 0 1 1 2 1 0 0 L cl =..... 0 0 0 2 1 1 0 0 1 2 L All classical Fourier modes are the eigenvectors of L cl By analogy, any graph s Fourier modes are the eigenvectors of its Laplacian matrix L. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 12 / 26

The graph Fourier matrix L = S W Its eigenvectors : U = (u 1 u 2 u N ) form the graph Fourier orthonormal basis. Its eigenvalues : 0 = λ 1 λ 2 λ N represent the graph frequencies. λ i is the squared frequency associated to the Fourier mode u i. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 13 / 26

Illustration Low frequency : High frequency : N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 14 / 26

The Fourier transform given f R N a signal on a graph of size N. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 15 / 26

The Fourier transform given f R N a signal on a graph of size N. ˆf is obtained by decomposing f on the eigenvectors u i : < u 1, f > < u 2, f > ˆf = < u 3, f >..., i.e. < u N, f > ˆf = U f N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 15 / 26

The Fourier transform given f R N a signal on a graph of size N. ˆf is obtained by decomposing f on the eigenvectors u i : < u 1, f > < u 2, f > ˆf = < u 3, f >..., i.e. < u N, f > ˆf = U f Inversely, the inverse Fourier transform reads : f = U ˆf N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 15 / 26

The Fourier transform given f R N a signal on a graph of size N. ˆf is obtained by decomposing f on the eigenvectors u i : < u 1, f > < u 2, f > ˆf = < u 3, f >..., i.e. < u N, f > ˆf = U f Inversely, the inverse Fourier transform reads : f = U ˆf The Parseval theorem stays valid : (g, h) < g, h > = < ĝ, ĥ > N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 15 / 26

Filtering 1 Given a filter function g defined in the Fourier space. g(λ) 0.8 0.6 0.4 0.2 0 1 2 λ N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 16 / 26

Filtering 1 Given a filter function g defined in the Fourier space. g(λ) 0.8 0.6 0.4 0.2 0 1 2 λ In the Fourier space, the signal filtered by g reads : ˆf g = ˆf (1) g(λ 1 ) ˆf (2) g(λ 2 ) ˆf (3) g(λ 3 )... ˆf (N) g(λ N ) = Ĝ ˆf with Ĝ = g(λ 1 ) 0 0... 0 0 g(λ 2 ) 0... 0 0 0 g(λ 3 )... 0............... 0 0 0... g(λ N ) N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 16 / 26

Filtering 1 Given a filter function g defined in the Fourier space. g(λ) 0.8 0.6 0.4 0.2 0 1 2 λ In the Fourier space, the signal filtered by g reads : ˆf g = ˆf (1) g(λ 1 ) ˆf (2) g(λ 2 ) ˆf (3) g(λ 3 )... ˆf (N) g(λ N ) = Ĝ ˆf with Ĝ = g(λ 1 ) 0 0... 0 0 g(λ 2 ) 0... 0 0 0 g(λ 3 )... 0............... 0 0 0... g(λ N ) In the node space, the filtered signal f g reads therefore : f g = U Ĝ U f = Gf N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 16 / 26

So where s the link? N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 17 / 26

Remember : the classical spectral clustering algorithm Given the N-node graph G of adjacency matrix W : 1. Compute : U k = (u 1 u 2 u k ) the first k eigenvectors of L = S W. 2. Consider each node i as a point in R k : f i = U k δ i. 3. Run k-means with the Euclidean distance : D ij = f i f j and obtain k clusters. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 18 / 26

Remember : the classical spectral clustering algorithm Given the N-node graph G of adjacency matrix W : 1. Compute : U k = (u 1 u 2 u k ) the first k eigenvectors of L = S W. 2. Consider each node i as a point in R k : f i = U k δ i. 3. Run k-means with the Euclidean distance : D ij = f i f j and obtain k clusters. Let s work on the first bottleneck : estimate D ij without partially diagonalizing the Laplacian matrix. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 18 / 26

Ideal low-pass filtering 1st step : assume we know U k and λ k 1.5 Given h λk an ideal LP, H λk = UĤλ k U = U k U k is its filter matrix. h λk (λ) 1 0.5 0-0.5 0 λ k 1 2 λ N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 19 / 26

Given h λk an ideal LP, H λk = UĤλ k U = U k U k is its filter matrix. Ideal low-pass filtering 1st step : assume we know U k and λ k h λk (λ) -0.5 0 λ k 1 2 Let R = (r 1 r 2 r η ) R N η be a random Gaussian matrix. We define f i = (H λk R) δ i R η and D ij = f i f j. 1.5 1 0.5 0 λ N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 19 / 26

Given h λk an ideal LP, H λk = UĤλ k U = U k U k is its filter matrix. Ideal low-pass filtering 1st step : assume we know U k and λ k h λk (λ) -0.5 0 λ k 1 2 Let R = (r 1 r 2 r η ) R N η be a random Gaussian matrix. We define f i = (H λk R) δ i R η and D ij = f i f j. Norm conservation theorem for ideal filter Let ɛ > 0, if η > η 0 log N ɛ 2, then, with proba > 1 1/N, we have : (i, j) [1, N] 2 (1 ɛ)d ij D ij (1 + ɛ)d ij. 1.5 1 0.5 0 λ N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 19 / 26

Non-ideal low-pass filtering 2nd step : assume all we know is λ k In practice, we use a poly approx of order m of h λk : h λk = m α l λ l h λk. l=1 h λk (λ) 1.5 1 0.5 0-0.5 0 λ k 1 2 λ ideal m=100 m=20 m=5 Does not require the knowledge of U k. Only involves matrix-vector multiplications [costs O(m E )]. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 20 / 26

In practice, we use a poly approx of order m of h λk : h λk = Non-ideal low-pass filtering 2nd step : assume all we know is λ k m α l λ l h λk. l=1 h λk (λ) 1.5 1 0.5 0-0.5 0 λ k 1 2 Indeed, in this case, filtering a vector x reads : m H λk x = U h λk (Λ)U x = U α l Λ l U x = l=1 λ ideal m=100 m=20 m=5 m α l L l x Does not require the knowledge of U k. Only involves matrix-vector multiplications [costs O(m E )]. l=1 N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 20 / 26

In practice, we use a poly approx of order m of h λk : h λk = Non-ideal low-pass filtering 2nd step : assume all we know is λ k m α l λ l h λk. l=1 h λk (λ) 1.5 1 0.5 0-0.5 0 λ k 1 2 Indeed, in this case, filtering a vector x reads : m H λk x = U h λk (Λ)U x = U α l Λ l U x = l=1 λ ideal m=100 m=20 m=5 m α l L l x Does not require the knowledge of U k. Only involves matrix-vector multiplications [costs O(m E )]. The theorem stays (more or less) valid with this non-ideal filtering! l=1 N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 20 / 26

Last step : estimate λ k Goal : given L, estimate its k-th eigenvalue as fast as possible. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 21 / 26

Last step : estimate λ k Goal : given L, estimate its k-th eigenvalue as fast as possible. We use eigencount techniques (also based on polynomial filtering of random vectors!) : given the interval [0, b], get an approximation of the number of enclosed eigenvalues. And find λ k by dichotomy on b. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 21 / 26

Accelerated spectral algorithm Given the N-node graph G of adjacency matrix W : N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 22 / 26

Accelerated spectral algorithm Given the N-node graph G of adjacency matrix W : 1. Estimate λ k, the k-th eigenvalue of L. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 22 / 26

Accelerated spectral algorithm Given the N-node graph G of adjacency matrix W : 1. Estimate λ k, the k-th eigenvalue of L. 2. Generate η random graph signals in matrix R R N η. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 22 / 26

Accelerated spectral algorithm Given the N-node graph G of adjacency matrix W : 1. Estimate λ k, the k-th eigenvalue of L. 2. Generate η random graph signals in matrix R R N η. 3. Filter them with H λk and treat each node i as a point in R η : f i = δ i H λk R. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 22 / 26

Accelerated spectral algorithm Given the N-node graph G of adjacency matrix W : 1. Estimate λ k, the k-th eigenvalue of L. 2. Generate η random graph signals in matrix R R N η. 3. Filter them with H λk and treat each node i as a point in R η : f i = δ i H λk R. 4. Run k-means with the Euclidean distance : D ij = f i f j and obtain k clusters. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 22 / 26

Accelerated spectral algorithm Given the N-node graph G of adjacency matrix W : 1. Estimate λ k, the k-th eigenvalue of L. 2. Generate η random graph signals in matrix R R N η. 3. Filter them with H λk and treat each node i as a point in R η : f i = δ i H λk R. 4. Run k-means with the Euclidean distance : D ij = f i f j and obtain k clusters. Let s work on the second bottleneck : avoid k-means in possibly very large dimension N (step 4). N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 22 / 26

Fast spectral algorithm? Given the N-node graph G of adjacency matrix W : 1. Estimate λ k, the k-th eigenvalue of L. 2. Generate η random graph signals in matrix R R N η. 3. Filter them with H λk and treat each node i as a point in R η : f i = δ i H λk R. 4. Sample randomly ρ k log k << N nodes out of N : f r i = M f i = (M H λk R) δ r i. 5. Run k-means in this reduced space with the Euclidean distance : D ij r = f r i f j r and obtain k clusters. 6. Interpolate cluster indicator functions cl r on the whole graph : c l = arg min x R N Mx cr l 2 + µxl x. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 23 / 26

Compressive spectral clustering : a summary 1. generate a feature vector for each node by filtering few random gaussian random signal on G ; N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 24 / 26

Compressive spectral clustering : a summary 1. generate a feature vector for each node by filtering few random gaussian random signal on G ; 2. subsample the set of nodes ; N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 24 / 26

Compressive spectral clustering : a summary 1. generate a feature vector for each node by filtering few random gaussian random signal on G ; 2. subsample the set of nodes ; 3. cluster the reduced set of nodes ; N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 24 / 26

Compressive spectral clustering : a summary 1. generate a feature vector for each node by filtering few random gaussian random signal on G ; 2. subsample the set of nodes ; 3. cluster the reduced set of nodes ; 4. interpolate the cluster indicator vectors back to the complete graph. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 24 / 26

This work was done in collaboration with : Gilles PUY and Rémi GRIBONVAL from the PANAMA team (INRIA). Pierre VANDERGHEYNST from EPFL. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 25 / 26

This work was done in collaboration with : Gilles PUY and Rémi GRIBONVAL from the PANAMA team (INRIA). Pierre VANDERGHEYNST from EPFL. Part of this work has been published (or submitted) : Circumventing the first bottleneck has been accepted to ICASSP 2016 Interpolation of k-bandlimited graph signals has been submitted to ACHA in November (an application of which helps us circumvent the second bottleneck). N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 25 / 26

Perspectives and difficult questions Two difficult questions (among others) : 1. Given a semi-definite positive matrix, how to estimate as fast as possible its k-th eigenvalue, and only that one? 2. How to subsample ρ nodes out of N while ensuring that clustering them in k classes is the result one would have obtained by clustering all N nodes? N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 26 / 26

Perspectives and difficult questions Two difficult questions (among others) : 1. Given a semi-definite positive matrix, how to estimate as fast as possible its k-th eigenvalue, and only that one? 2. How to subsample ρ nodes out of N while ensuring that clustering them in k classes is the result one would have obtained by clustering all N nodes? Perspectives 1. How about if nodes are added one by one? 2. Rational filters instead of polynomial filters? 3. Approximating other spectral clustering algorithms? N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 26 / 26