Algorithms

Here we explain the setup and algorithm when the data points reside on linear subspaces, the algorithm can also cluster affine subspaces with simple modifications (see Section 2.2 of Robust Subspace Clustering for further detail).

Setup and notation

N points in n dimensions mathbf{Y}=[y_1, y_2, ldots, y_N] normalized to have unit Euclidean norm |y_l|_{ell_2}=1.

Noisy problem

mathbf{Y}=mathbf{X}+mathbf{Z}quadLeftrightarrowquad y_j=x_j+z_jquad
  • mathbf{X}: clean data points. Columns of mathbf{X} belong to union of subspaces S_1cup S_2cupldotscup S_L of unknown dimensions d_1, d_2, ldots, d_L.

  • mathbf{Z}: noise. Each column has bounded norm |z_j|_{ell_2}lesigma.

 

Missing data problem

mathbf{Y}=mathcal{P}_Omega(mathbf{X})
  • mathbf{X}: clean data points. Columns of mathbf{X} belong to union of subspaces S_1cup S_2cupldotscup S_L of unknown dimensions d_1, d_2, ldots, d_L.

  • mathcal{P}_Omega is an operator that zeros out each entry with probability delta.

 

General Structure

The general structure of the algorithm follows the standard machine:

  1. Construct affinity matrix mathbf{W} between samples producing a weighted graph.

  2. Construct clusters by applying spectral clustering to mathbf{W}.

  3. Apply PCA to each cluster.

Following Sparse Subspace Clustering (SSC) of Elhamifar and Vidal, we use ideas from sparse representation theory and sparse regression to compute affinities.

Sparse regression

We first, calculate a sparse coefficient sequence beta^{(i)} obtained by regressing the ith data point y_i onto all other data points mathbf{Y}_{-i}. The hope is that such a sparse representation of y_i would only select vectors from the cluster in which y_i belongs to. This is depicted in the figure below with each color denoting points from a different cluster.

We propose two strategies for the sparse regression step below.

  • Two step procedure with data-driven regularization.

  • Bias-corrected Dantzig Selector.

 

Building a graph based on the sparse coefficients

We set

mathbf{W}_{ij}=|beta^{(i)}_j|+|beta^{(j)}_i|

After applying a permutation which makes sure that columns in the same cluster are contiguous we expect the affinity matrix to look like the figure below where each block corresponds to a different cluster.

 

Two Step Procedure with Data Driven Regularization

For i=1,ldots,N

 

We choose lambda_i in a data driven fashion:

 

Choice of parameters: tau=2sigma, f(t)=1/(4t).

Bias-corrected Dantzig Selector

Again regress one against the others this time by Dantzig Selector.

beta^{(i)}=argmin |beta|_{ell_1} subject to |gamma^{(i)}-mathbf{Gamma}^{(i)}beta)|_{infty}lelambda.

gamma^{(i)}=mathbf{X}_{-i}^Tx_i and mathbf{Gamma}^{(-i)}=mathbf{X}_{-i}^Tmathbf{X}_{-i}.

In practice we only get to observe the noisy versions y_i and mathbf{Y}_{-i}.

Solution: Build unbiased estimators for gamma^{(i)} and mathbf{Gamma}^{(i)}. Set

 hat{gamma}^{(i)}=mathbf{Y}_{-i}^Ty_i,quadhat{mathbf{Gamma}}^{(i)}=mathbf{Y}_{-i}^Tmathbf{Y}_{-i}-sigma^2mathbf{I}.

Finally, we arrive at at the following sequence of optimization problems.

beta^{(i)}=argmin |beta|_{ell_1} subject to |hat{gamma}^{(i)}-hat{mathbf{Gamma}}^{(-i)}beta)|_{infty}lelambda.

Choice of parameters: lambda=sqrt{32/n}sigmasqrt{1+sigma^2}.

Bias-corrected Dantzig Selector for missing data

Omega_isubset{1,2,ldots,n} denotes the observations from the i-th column of the clean data matrix mathbf{X}

mathbf{X}_{Omega_i} denotes the submatrix of mathbf{X} with rows selected by Omega_i.

Again regress one against the others this time by Dantzig Selector.

beta^{(i)}=argmin |beta|_{ell_1} subject to |gamma^{(i)}-mathbf{Gamma}^{(i)}beta)|_{infty}lelambda.

gamma^{(i)}=mathbf{X}_{Omega_i}^Tx_{Omega_i} and mathbf{Gamma}^{(-i)}=mathbf{X}_{Omega_i}^Tmathbf{X}_{Omega_i}.

In practice we only get to observe a few entries.

Solution: Again build unbiased estimators for gamma^{(i)} and Gamma^{(i)}. Set

We build a matrix mathbf{Y} based on the observed entries of mathbf{X} as follows. Set Y_{ij}= frac{X_{ij}}{(1-delta)} if observed and Y_{ij}= 0 if missing.

Set hat{mathbf{Gamma}}^{(i)}=mathbf{Y}_{Omega_i}^Tmathbf{Y}_{Omega_i}-delta diag(mathbf{Y}_{Omega_i}^Tmathbf{Y}_{Omega_i}) with the ith row and column set to zero.

Set hat{gamma}^{(i)}=mathbf{Y}_{Omega_i}^Ty_{Omega_i}^{(i)} with the ith row set to zero.

Finally, we arrive at

beta^{(i)}=argmin |beta|_{ell_1} subject to |hat{gamma}^{(i)}-hat{mathbf{Gamma}}^{(-i)}beta)|_{infty}lelambda.

Choice of parameters: lambda=sqrt{frac{2log N}{n}}frac{bar{delta}}{1-bar{delta}}, where bar{delta}=number of missing entries/total number of entries.