Graph signal processing for clustering

Graph signal processing for clustering Nicolas Tremblay PANAMA Team, INRIA Rennes with Rémi Gribonval, Signal Processing Laboratory 2, EPFL, Lausanne with Pierre Vandergheynst.

What s clustering? N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 1 / 26

Given a series of N objects : N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 2 / 26

Given a series of N objects : 1/ Find adapted descriptors N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 2 / 26

Given a series of N objects : 1/ Find adapted descriptors 2/ Cluster N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 2 / 26

After step 1, one has : N vectors in d dimensions (descriptor dimension) : x 1, x 2,, x N R d N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 3 / 26

After step 1, one has : N vectors in d dimensions (descriptor dimension) : x 1, x 2,, x N R d and their distance matrix D R N N. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 3 / 26

After step 1, one has : N vectors in d dimensions (descriptor dimension) : x 1, x 2,, x N R d and their distance matrix D R N N. The goal of clustering is to assign a label c(i) = 1,, k to each object i in order to organize / simplify / analyze the data. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 3 / 26

After step 1, one has : N vectors in d dimensions (descriptor dimension) : x 1, x 2,, x N R d and their distance matrix D R N N. The goal of clustering is to assign a label c(i) = 1,, k to each object i in order to organize / simplify / analyze the data. There exists two different general types of methods : methods directly based on the x i and/or D like k-means or hierarchical clustering. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 3 / 26

After step 1, one has : N vectors in d dimensions (descriptor dimension) : x 1, x 2,, x N R d and their distance matrix D R N N. The goal of clustering is to assign a label c(i) = 1,, k to each object i in order to organize / simplify / analyze the data. There exists two different general types of methods : methods directly based on the x i and/or D like k-means or hierarchical clustering. graph-based methods. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 3 / 26

Graph construction from the distance matrix D Create a graph G = (V, E) : N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 4 / 26

Graph construction from the distance matrix D Create a graph G = (V, E) : each node in V is one of the N objects N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 4 / 26

Graph construction from the distance matrix D Create a graph G = (V, E) : each node in V is one of the N objects each pair of nodes (i, j) is connected if the associated distance D(i, j) is small enough. For example, two connectivity possibilities : Gaussian kernel : 1. all pairs of nodes are connected with links of weights exp( D(i, j)/σ) 2. remove all links of weight inferior to ɛ N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 4 / 26

Graph construction from the distance matrix D Create a graph G = (V, E) : each node in V is one of the N objects each pair of nodes (i, j) is connected if the associated distance D(i, j) is small enough. For example, two connectivity possibilities : Gaussian kernel : 1. all pairs of nodes are connected with links of weights exp( D(i, j)/σ) 2. remove all links of weight inferior to ɛ k nearest neighbors : connect each node to its k nearest neighbors. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 4 / 26

The problem now states : Given the graph G representing the similarity between the N objects, find a partition of all nodes in k clusters. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 5 / 26

The problem now states : Given the graph G representing the similarity between the N objects, find a partition of all nodes in k clusters. Many methods exist [Fortunato 10] : Modularity (or other cost-function) optimisation methods [Newman 04] Random walk methods [Delvenne 10] Methods inspired from statistical physics [Krzakala 12], information theory [Rosvall 07]... spectral methods... N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 5 / 26

Three useful matrices The adjacency matrix : The degree matrix : 0 1 1 0 2 0 0 0 W = 1 0 1 1 1 1 0 0 S = 0 3 0 0 0 0 2 0 0 1 0 0 0 0 0 1 The Laplacian matrix : 2 1 1 0 L = S W = 1 3 1 1 1 1 2 0 0 1 0 1 N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 6 / 26

Three useful matrices The adjacency matrix : The degree matrix : 0.5.5 0 1 0 0 0 W =.5 0.5 4.5.5 0 0 S = 0 5 0 0 0 0 1 0 0 4 0 0 0 0 0 4 The Laplacian matrix : 1.5.5 0 L = S W =.5 5.5 4.5.5 1 0 0 4 0 4 N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 6 / 26

The classical spectral clustering algorithm [Von Luxburg 06] : Given the N-node graph G of adjacency matrix W : 1. Compute : U k = (u 1 u 2 u k ) the first k eigenvectors of L = S W. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 7 / 26

The classical spectral clustering algorithm [Von Luxburg 06] : Given the N-node graph G of adjacency matrix W : 1. Compute : U k = (u 1 u 2 u k ) the first k eigenvectors of L = S W. 2. Consider each node i as a point in R k : f i = U k δ i. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 7 / 26

The classical spectral clustering algorithm [Von Luxburg 06] : Given the N-node graph G of adjacency matrix W : 1. Compute : U k = (u 1 u 2 u k ) the first k eigenvectors of L = S W. 2. Consider each node i as a point in R k : f i = U k δ i. 3. Run k-means with the Euclidean distance : D ij = f i f j and obtain k clusters. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 7 / 26

What s the point of using a graph? N points in d = 2 dimensions. Result with k-means (k=2) : After creating a graph from the N points interdistances, and running the spectral clustering algorithm (with k=2) : N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 8 / 26

Computation bottlenecks of the spectral clustering algorithm When N and/or k become too large, there are two main bottlenecks in the algorithm : 1. The partial eigendecomposition of the Laplacian. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 9 / 26

Computation bottlenecks of the spectral clustering algorithm When N and/or k become too large, there are two main bottlenecks in the algorithm : 1. The partial eigendecomposition of the Laplacian. 2. k-means. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 9 / 26

Computation bottlenecks of the spectral clustering algorithm When N and/or k become too large, there are two main bottlenecks in the algorithm : 1. The partial eigendecomposition of the Laplacian. 2. k-means. Our goal : Circumvent both! N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 9 / 26

What s graph signal processing? N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 10 / 26

What s a graph signal? 40 20 C 0 20 J F M A M J J A S O N D 15 C 10 5 0 5 10 15 20 Heure N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 11 / 26

What s the graph Fourier matrix? [Hammond 11] The classical graph : 2 1 0 0 1 1 2 1 0 0 L cl =..... 0 0 0 2 1 1 0 0 1 2 All classical Fourier modes are the eigenvectors of L cl N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 12 / 26

What s the graph Fourier matrix? [Hammond 11] The classical graph : Any graph : 2 1 0 0 1 1 2 1 0 0 L cl =..... 0 0 0 2 1 1 0 0 1 2 L All classical Fourier modes are the eigenvectors of L cl By analogy, any graph s Fourier modes are the eigenvectors of its Laplacian matrix L. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 12 / 26

The graph Fourier matrix L = S W Its eigenvectors : U = (u 1 u 2 u N ) form the graph Fourier orthonormal basis. Its eigenvalues : 0 = λ 1 λ 2 λ N represent the graph frequencies. λ i is the squared frequency associated to the Fourier mode u i. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 13 / 26

Illustration Low frequency : High frequency : N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 14 / 26

The Fourier transform given f R N a signal on a graph of size N. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 15 / 26

The Fourier transform given f R N a signal on a graph of size N. ˆf is obtained by decomposing f on the eigenvectors u i : ˆf = ..., i.e. ˆf = U f N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 15 / 26

The Fourier transform given f R N a signal on a graph of size N. ˆf is obtained by decomposing f on the eigenvectors u i : ˆf = ..., i.e. ˆf = U f Inversely, the inverse Fourier transform reads : f = U ˆf N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 15 / 26

The Fourier transform given f R N a signal on a graph of size N. ˆf is obtained by decomposing f on the eigenvectors u i : ˆf = ..., i.e. ˆf = U f Inversely, the inverse Fourier transform reads : f = U ˆf The Parseval theorem stays valid : (g, h) < g, h > = < ĝ, ĥ > N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 15 / 26

Filtering 1 Given a filter function g defined in the Fourier space. g(λ) 0.8 0.6 0.4 0.2 0 1 2 λ N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 16 / 26

Filtering 1 Given a filter function g defined in the Fourier space. g(λ) 0.8 0.6 0.4 0.2 0 1 2 λ In the Fourier space, the signal filtered by g reads : ˆf g = ˆf (1) g(λ 1 ) ˆf (2) g(λ 2 ) ˆf (3) g(λ 3 )... ˆf (N) g(λ N ) = Ĝ ˆf with Ĝ = g(λ 1 ) 0 0... 0 0 g(λ 2 ) 0... 0 0 0 g(λ 3 )... 0............... 0 0 0... g(λ N ) N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 16 / 26

Filtering 1 Given a filter function g defined in the Fourier space. g(λ) 0.8 0.6 0.4 0.2 0 1 2 λ In the Fourier space, the signal filtered by g reads : ˆf g = ˆf (1) g(λ 1 ) ˆf (2) g(λ 2 ) ˆf (3) g(λ 3 )... ˆf (N) g(λ N ) = Ĝ ˆf with Ĝ = g(λ 1 ) 0 0... 0 0 g(λ 2 ) 0... 0 0 0 g(λ 3 )... 0............... 0 0 0... g(λ N ) In the node space, the filtered signal f g reads therefore : f g = U Ĝ U f = Gf N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 16 / 26

So where s the link? N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 17 / 26

Remember : the classical spectral clustering algorithm Given the N-node graph G of adjacency matrix W : 1. Compute : U k = (u 1 u 2 u k ) the first k eigenvectors of L = S W. 2. Consider each node i as a point in R k : f i = U k δ i. 3. Run k-means with the Euclidean distance : D ij = f i f j and obtain k clusters. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 18 / 26

Remember : the classical spectral clustering algorithm Given the N-node graph G of adjacency matrix W : 1. Compute : U k = (u 1 u 2 u k ) the first k eigenvectors of L = S W. 2. Consider each node i as a point in R k : f i = U k δ i. 3. Run k-means with the Euclidean distance : D ij = f i f j and obtain k clusters. Let s work on the first bottleneck : estimate D ij without partially diagonalizing the Laplacian matrix. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 18 / 26

Ideal low-pass filtering 1st step : assume we know U k and λ k 1.5 Given h λk an ideal LP, H λk = UĤλ k U = U k U k is its filter matrix. h λk (λ) 1 0.5 0-0.5 0 λ k 1 2 λ N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 19 / 26

Given h λk an ideal LP, H λk = UĤλ k U = U k U k is its filter matrix. Ideal low-pass filtering 1st step : assume we know U k and λ k h λk (λ) -0.5 0 λ k 1 2 Let R = (r 1 r 2 r η ) R N η be a random Gaussian matrix. We define f i = (H λk R) δ i R η and D ij = f i f j. 1.5 1 0.5 0 λ N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 19 / 26

Given h λk an ideal LP, H λk = UĤλ k U = U k U k is its filter matrix. Ideal low-pass filtering 1st step : assume we know U k and λ k h λk (λ) -0.5 0 λ k 1 2 Let R = (r 1 r 2 r η ) R N η be a random Gaussian matrix. We define f i = (H λk R) δ i R η and D ij = f i f j. Norm conservation theorem for ideal filter Let ɛ > 0, if η > η 0 log N ɛ 2, then, with proba > 1 1/N, we have : (i, j) [1, N] 2 (1 ɛ)d ij D ij (1 + ɛ)d ij. 1.5 1 0.5 0 λ N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 19 / 26

Non-ideal low-pass filtering 2nd step : assume all we know is λ k In practice, we use a poly approx of order m of h λk : h λk = m α l λ l h λk. l=1 h λk (λ) 1.5 1 0.5 0-0.5 0 λ k 1 2 λ ideal m=100 m=20 m=5 Does not require the knowledge of U k. Only involves matrix-vector multiplications [costs O(m E )]. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 20 / 26

In practice, we use a poly approx of order m of h λk : h λk = Non-ideal low-pass filtering 2nd step : assume all we know is λ k m α l λ l h λk. l=1 h λk (λ) 1.5 1 0.5 0-0.5 0 λ k 1 2 Indeed, in this case, filtering a vector x reads : m H λk x = U h λk (Λ)U x = U α l Λ l U x = l=1 λ ideal m=100 m=20 m=5 m α l L l x Does not require the knowledge of U k. Only involves matrix-vector multiplications [costs O(m E )]. l=1 N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 20 / 26

In practice, we use a poly approx of order m of h λk : h λk = Non-ideal low-pass filtering 2nd step : assume all we know is λ k m α l λ l h λk. l=1 h λk (λ) 1.5 1 0.5 0-0.5 0 λ k 1 2 Indeed, in this case, filtering a vector x reads : m H λk x = U h λk (Λ)U x = U α l Λ l U x = l=1 λ ideal m=100 m=20 m=5 m α l L l x Does not require the knowledge of U k. Only involves matrix-vector multiplications [costs O(m E )]. The theorem stays (more or less) valid with this non-ideal filtering! l=1 N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 20 / 26

Last step : estimate λ k Goal : given L, estimate its k-th eigenvalue as fast as possible. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 21 / 26

Last step : estimate λ k Goal : given L, estimate its k-th eigenvalue as fast as possible. We use eigencount techniques (also based on polynomial filtering of random vectors!) : given the interval [0, b], get an approximation of the number of enclosed eigenvalues. And find λ k by dichotomy on b. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 21 / 26

Accelerated spectral algorithm Given the N-node graph G of adjacency matrix W : N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 22 / 26

Accelerated spectral algorithm Given the N-node graph G of adjacency matrix W : 1. Estimate λ k, the k-th eigenvalue of L. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 22 / 26

Accelerated spectral algorithm Given the N-node graph G of adjacency matrix W : 1. Estimate λ k, the k-th eigenvalue of L. 2. Generate η random graph signals in matrix R R N η. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 22 / 26

Accelerated spectral algorithm Given the N-node graph G of adjacency matrix W : 1. Estimate λ k, the k-th eigenvalue of L. 2. Generate η random graph signals in matrix R R N η. 3. Filter them with H λk and treat each node i as a point in R η : f i = δ i H λk R. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 22 / 26

Accelerated spectral algorithm Given the N-node graph G of adjacency matrix W : 1. Estimate λ k, the k-th eigenvalue of L. 2. Generate η random graph signals in matrix R R N η. 3. Filter them with H λk and treat each node i as a point in R η : f i = δ i H λk R. 4. Run k-means with the Euclidean distance : D ij = f i f j and obtain k clusters. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 22 / 26

Accelerated spectral algorithm Given the N-node graph G of adjacency matrix W : 1. Estimate λ k, the k-th eigenvalue of L. 2. Generate η random graph signals in matrix R R N η. 3. Filter them with H λk and treat each node i as a point in R η : f i = δ i H λk R. 4. Run k-means with the Euclidean distance : D ij = f i f j and obtain k clusters. Let s work on the second bottleneck : avoid k-means in possibly very large dimension N (step 4). N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 22 / 26

Fast spectral algorithm? Given the N-node graph G of adjacency matrix W : 1. Estimate λ k, the k-th eigenvalue of L. 2. Generate η random graph signals in matrix R R N η. 3. Filter them with H λk and treat each node i as a point in R η : f i = δ i H λk R. 4. Sample randomly ρ k log k << N nodes out of N : f r i = M f i = (M H λk R) δ r i. 5. Run k-means in this reduced space with the Euclidean distance : D ij r = f r i f j r and obtain k clusters. 6. Interpolate cluster indicator functions cl r on the whole graph : c l = arg min x R N Mx cr l 2 + µxl x. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 23 / 26

Compressive spectral clustering : a summary 1. generate a feature vector for each node by filtering few random gaussian random signal on G ; N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 24 / 26

Compressive spectral clustering : a summary 1. generate a feature vector for each node by filtering few random gaussian random signal on G ; 2. subsample the set of nodes ; N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 24 / 26

Compressive spectral clustering : a summary 1. generate a feature vector for each node by filtering few random gaussian random signal on G ; 2. subsample the set of nodes ; 3. cluster the reduced set of nodes ; N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 24 / 26

Compressive spectral clustering : a summary 1. generate a feature vector for each node by filtering few random gaussian random signal on G ; 2. subsample the set of nodes ; 3. cluster the reduced set of nodes ; 4. interpolate the cluster indicator vectors back to the complete graph. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 24 / 26

This work was done in collaboration with : Gilles PUY and Rémi GRIBONVAL from the PANAMA team (INRIA). Pierre VANDERGHEYNST from EPFL. N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 25 / 26

This work was done in collaboration with : Gilles PUY and Rémi GRIBONVAL from the PANAMA team (INRIA). Pierre VANDERGHEYNST from EPFL. Part of this work has been published (or submitted) : Circumventing the first bottleneck has been accepted to ICASSP 2016 Interpolation of k-bandlimited graph signals has been submitted to ACHA in November (an application of which helps us circumvent the second bottleneck). N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 25 / 26

Perspectives and difficult questions Two difficult questions (among others) : 1. Given a semi-definite positive matrix, how to estimate as fast as possible its k-th eigenvalue, and only that one? 2. How to subsample ρ nodes out of N while ensuring that clustering them in k classes is the result one would have obtained by clustering all N nodes? N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 26 / 26

Perspectives and difficult questions Two difficult questions (among others) : 1. Given a semi-definite positive matrix, how to estimate as fast as possible its k-th eigenvalue, and only that one? 2. How to subsample ρ nodes out of N while ensuring that clustering them in k classes is the result one would have obtained by clustering all N nodes? Perspectives 1. How about if nodes are added one by one? 2. Rational filters instead of polynomial filters? 3. Approximating other spectral clustering algorithms? N. Tremblay Graph signal processing for clustering Rennes, 13th of January 2016 26 / 26