Laplacian eigenmaps

The Laplace operator Δ on smooth functions on n may be written in a variety of ways:

Δf=div(f)=f=2fx12++2fxn2

We would like to extend the concept of Δf to

  1. smooth manifolds (coordinate independent version)
  2. graphs (combinatorial version)

01 Theory - Laplace-Beltrami operator

The extension of the concept of Δ to smooth manifolds is called the Laplace-Beltrami operator. It is still defined as the divergence of the gradient, but these too must be upgraded to the context of manifolds.

Write C(M) for the space of smooth functions M. Write Ωn(M) for the space of differential n-forms on M. The exterior derivative is a map:

dn:Ωn(M)Ωn+1(M)

and the co-differential is a map:

δn+1:Ωn+1(M)Ωn(M)

(It is defined using the Hodge star operator.) Then Δf=(δd)(f).

The spectrum of Δ is the set of eigenfunctions f:M, meaning functions satisfying:

Δf=λf

for some λ.

If you know the complete spectrum of M, then you know a lot about the shape of M (although you may not be able to recover M completely).

02 Theory - Laplacian for graphs

In spectral graph theory one studies the spectrum of a combinatorial extension of the Laplacian concept to a graph.

We start with a weighted graph:

  • Vertices {x1,,xn}
  • Edges with weights wij

The adjacency matrix A is the matrix (wij) for off-diagonal elements, and zeros on the diagonal. (No self-loops.)

center

The graph Laplacian Δ is the matrix with (wij) off-diagonal, and the diagonal entries given as follows:

Dii=j=1jinwij

Then:

Δ=DA

Explanation:

For a graph G, let

  • C1(G) be the vector space with edges of G as basis
  • C0(G) be the vector space with vertices of G as basis

Then define

d1:C1(G)C0(G)

by d:(xi,xj)xjxi.

There is an adjoint (transpose) map C0(G)C1(G).

Laplacian eigenmaps process

Let us be given a scale σ and a target dimension k.

Form a weighted graph:

  • Vertices {x1,,xm}
  • Edge weights: wij as follows:
wij={exp(dM(xi,xj)2σ)dM(xi,xj)<σ0dM(xi,xj)>σ

Let W=(wij) and D by Dii=jwij. The graph Laplacian is Δ=DW.

Consider the matrix with columns given by eigenvectors corresponding to the k-largest eigenvalues of L. The rows of this matrix define points yik. Define θ by assigning

θ:xiyi

This definition of y1,,ym minimizes the error:

=i,j|yiyj|2wij

Given the definition of wij, this error penalizes most strongly nearby points xi,xj that are sent to distant points yi,yj.

The Laplacian eigenmaps process does not work well for spheres and other noncontractible spaces that are not covered by a single coordinate chart.

center

UMAP

Uniform Manifold Approximation and Prediction

The goal of UMAP is to rebuild the data in a high N space N as:

  • Low-dimensional manifold
    • The data is first cast as a weighted graph (1-dimensional)
  • Uniform sampling assumption
    • Dense clusters are spread apart and sparse areas contracted

03 Theory - UMAP process

We are given a data cloud {x1,,xn}N, and a neighbor-count k.

(1) Create weighted partial graphs local to each data point:

center

For each point xi, define weights as follows:

  • Set w(xi,y1)=1.
  • All yj are infinitely far from each other: w(wi,wj)=0 for ij.
  • For j2:
    • Require that w(xi,yj) is proportional to d(xi,yj)d(xi,y1),
    • Require that j=2kw(xi,yj)=log2k.

That is, the nearest neighbor y1 is placed at distance 0 from xi, and the other k1 neighbors are made closer to xi by the distance removed. For each fixed i, the collection {w(xi,yj)}j=1,,k encodes metric structure of the original data near xi but with the closest point y1 substituted for xi.

The weight w(xi,yj) is supposed to be interpreted as the probability that the point yj will lie in the same cluster as the point xi.

From these requirements it is possible to deduce a formula for w(xi,yj), as follows:

Set d1=d(xi,y1). (Distance to the closest neighbor.) For each i, solve for σi such that:

j=1kexp(max(0,d(xi,yj)d1σi)=log2k

The idea is that for the closest neighbor y1, the weight w(xi,y1) should be the highest; the weight should decrease for neighbors that are farther away.


(2) Glue partial metrics into a single weighted graph:

Now we define a single weighted graph with vertices the points x1,,xn and weights coming from the set of partial graphs defined in Step (1).

Observe that a given pair of vertices xi,xi could appear with finite distance in at most two partial graphs. The relation of being “nearest neighbor” is not in general symmetric:

center

Now, let us be given a pair of vertices (xi,xi).

If this pair appears (with finite weight) in only one partial graph, give the pair a weighted edge in the total graph with the same weight it had in that partial graph.

If this pair appears in two partial graphs, then xi is the center point in one of them and xi is the center in the other.

center

We combine the weights as follows:

  • Let p1=w(xi,xi)
  • Let p2=w(xi,xi)

Then the weight on the pair (xi,xi) shall be defined as:

1(1p1)(1p2)=p1+p2p1p2

Now we have defined a total weighted graph that combines the information from all the weighted partial (local) graphs.


(3) Embed the total weighted graph G in a lower-dimensional space k as a weighted graph H.

This can be done with standard techniques that we don’t cover here. Briefly:

Start with the spectral graph embedding. Edge weights for edge e are pe in G and qe in H. Cross-entropy formula:

epelog(peqe)+(1pe)log(1pe1qe)

Use this cross-entropy as a cost function and perform stochastic gradient descent to minimize the cost.

04 Theory - Gradient descent

Suppose we are given a cost function f:M.

Given any xM, the gradient of f, written D𝐯(f), is the direction of the largest rate of change of f.

𝐯=f=fx1,,fxn

Smallest rate of change is f.

center

05 Illustration

UMAP is based on an assumption that the data samples uniformly from a manifold.

Hausdorff manifolds?

Consider X=\{0}{B}{R} with the standard non-Hausdorff metric:

center

So the points B and R cannot be separated by a disjoint neighborhood.

But this can actually happen with data! Suppose flowers come in two colors, red and blue, but the value (lightness) ranges from very dark (black) to very light (white):

center

06 Illustration

center

(Böhm et al. 2022)