Skip to content

LLEs for those in a hurry

by Nikhil R, on 06/02/2025

Introduction

Locally Linear Embedding (LLE), an instance of classical unsupervised manifold learning, aims to uncover low-dimensional representations of high dimensional data while preserving local geometric relationships through the intuitive principle that a large number of points and their neighbours sampled from a smooth underlying manifold lie on or around a locally linear patch. A global embedding is computed through minimization of a sum-of-squares cost function, first on local reconstructions of each point, and then on the target embeddings. When compared with traditional dimensionality reduction techniques such as Principal Component Analysis (PCA), LLE is more effective at capturing complex, nonlinear structures- lending itself amenable to applications in image processing and pattern recognition.

Notation

In this article, for any matrix .F2 represents the squared Frobenius norm, .22 is the L2-norm, Mp(R) is the set of all square matrices of dimension p×p over the reals, Mm×n(R) is the set of all matrices over the reals of dimensions m×n. The term xf represents the partial derivative of f with respect to x, f is the gradient. If A=Xf(X) where X and f(X) is a scalar function on a matrix, then, Aij=Xijf(X)

Construction

Proposition 3.1. A Gradient Identity

Suppose AMn(R) is symmetric, and xRn, then the function f=xTAx has gradient 2Ax.

Proof: Let A=[a1..an], and v=Ax. Then xif=e^iT(Ax)+xTxi(Ax) by the product rule for bilinear operators. But, e^iTv=vi and xTai=xai=vi. So, xif=2vi.

Remark: The matrix identity Xtr(XTAX)=2AX also follows.

Definition 3.1: Reconstruction

A vector x is said to be reconstructed by 'neighbours' v1,v2,...,vk with weights wRk when x[v1..vk]w22 is minimized and wi=1. This is denoted w=ψ(x;v1,...,vn)

Proposition 3.2.

Consider vector x and viRd. Let G be the Gram matrix of (x1T[v1..vn]). Then:

ψ(x;v1,...,vn)=G111TG11

when G is invertible.

Proof: Consider the variable vector w, and the objective ε(w)=x[v1..vk]w22=xXw22. Rewrite ε(w) using 1Tw=1 and apply a22=aTa giving ((x1TX)w)T((x1TX)w), so ε(w)=wTGw. It is clear that the extremum, which must be a minimum, occurs when ε=λ1 where λ is the Lagrange multiplier. Since G is invertible, we can write 2Gw=λ1w=λ2G11. The constraint 1Tw=λ21TG11=1λ=21TG11. The result follows by substitution.

Remark: The invertibility of G is an important pre-condition, which is discernibly difficult to guarentee in practice since it requires linear independence of displacements to neighbours. Furthermore, G has atmost a rank of min{d,k} which requires dk (as it so often is). This problem is side-stepped by introducing a regularization parameter ϵR+ to obtain G=G+ϵI which will invert since G is positive semi-definite.

Definition 3.2: k-NN Graph

The k-NN Graph for a set of points x1,x2,...,xnRd is generated from k-nearest neighbours in the following manner:

  1. Let vertices of the graph be v1,v2,...,vn.
  2. Let N be the array of indices such that iNn,Nik is the index of the vector that is the k-th nearest neighbour of xi.
  3. For each xi compute w=ψ(xi;xNi1,..,xNik), then add an edge from v1 to vNij for j=1,...,k with weight wj

k-NN graph for points sampled from the surface of the unit sphere with clipped parameters heta, hi im athcal{N}(rac{i}{4}, rac{i}{8})

Sampled points visualized on the surface

Definition 3.3: Embedding points

The points y1,y2,...,ynRp are embedding points for a k-NN Graph with adjacency matrix W when 1nYYT=I and Y(IW)TF2 is minimized where Y=[y1,y2,...,yn].

Proposition 3.3

The columns of YTMn×p(R) are the eigenvectors corresponding to p lowest non-zero eigenvalues of M=(IW)T(IW)

Proof: Since AF2=tr(AAT), we have the objective ε(Y)=tr(YMYT). The Lagrange function, expressing both the objective and constraints is L=tr(YMYT)tr(YYTΛ)+tr(Λ) where ΛMp(R) is an upper-triangular matrix of coefficients corresponding to the unit covariance constraint on the embeddings. Should we choose Λ to be symmetric, the present form allows differentation with YT, yielding the target eigenvalue problem. LYT=2MYT2YTΛ=0M=YTΛ(YT)1 where YMp×n(R) is a submatrix of YMn(R). Minimization of objective requires columns of YT to have the lowest eigenvalues.

Remark: The matrix M is the Gram matrix over the Laplacian of the k-NN graph, and shares the null space of (IW). Eigenvectors in the null space occur as a consequence of the kNN graph comprising multiple connected components, and do not capture characteristics of the sampling manifold. Furthermore, since [1...1] is always a solution, the resulting embeddings must have a mean of 0.

Definition 3.4: Out-of-Sample Embedding

The embedding of a vector x not encountered in sample x1,...,xn with embeddings yi is [yn1,yn2,..,ynk]w where w=ψ(x;xn1,..,xnk), and xn1,..,xnk are the k nearest neighbours of x

Made with ❤️ by Aura