Eigenfunction analysis of kernels

Overview

This attempts to describe kernels. The hope is after going through this, the reader appreciates just how powerful kernels are, and the central role they play in Gaussian process models.

Code
### Data 
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import display, HTML

Mercer’s theorem (kernels feature maps)

Now, we shall be interested in mapping from a kernel to a feature map. This leads us to Mercer’s theorem, which states that: A symmetric function k(x,x) can be expressed as the inner product

k(x,x)=ϕT(x)ϕ(x)=ϕ(x),ϕ(x)

for some feature map ϕ if and only if k(x,x) is positive semidefinite, i.e.,

k(x,x)g(x)g(x)dxdx0

for all real g.

One possible set of features corresponds to eigenfunctions. A function ν(x) that satisfies the integral equation

k(x,x)ν(x)dx=λν(x)

is termed an eigenfunction of the kernel k. In the expression above, λ is the corresponding eigenvalue. While the integral above is taken with respect to x, more formally, it can be taken with respect to either a density ρ(x), or the Lebesgue measure over a compact subset of RD, which reduces to dx. The eigenfunctions form an orthogonal basis and thus

νi(x)νj(x)dx=δij

where δij is the Kronecker delta. When i=j, its value is 1; zero otherwise. Thus, one can define a kernel using its eigenfunctions

k(x,x)=i=1λiν(x)ν(x).

Numerical solution

If the covariance matrix is already available, one write its eigendecomposition

K=VΛVT

where V is a matrix of formed by the eigenvectors of K and Λ is a diagonal matrix of its eigenvalues, i.e.,

V=[|||v1v2vN|||],andΛ=[λ1λ2λN],

where λ1λ2λN0. This expansion permits one to express each element of K as

K=i=1N(λivi)(λivi)T.

Beyond numerical solutions, for many kernels there exists analytical solutions for the eigenvalues and eigenvectors. For further details please see page 97 in RW. For now, we simply consider numerical solutions as shown below.

Code
N = 30
x = np.linspace(-2, 2, N).reshape(N,1)
R = (np.tile(x, [1, N]) - np.tile(x.T, [N, 1]))**2
l = 0.5
K = np.exp(-0.5 * R * 1/l**2)

fig = plt.figure()
d = plt.imshow(K)
plt.colorbar(d)
plt.title('Squared exponential')
plt.show()

Code
Lambda, V = np.linalg.eigh(K)
idx = Lambda.argsort()[::-1]
lambdas = Lambda[idx]
V = V[:, idx]
Code
fig = plt.figure()
plt.semilogy(lambdas, 'o-')
plt.ylabel('Eigenvalues of covariance matrix (log)')
plt.xlabel('Number of data points')
plt.show()

Code
T = 5 # truncated basis
K_approx = np.zeros((N, N))
for i in range(0, T):
    feature = (np.sqrt(lambdas[i]) * V[:,i]).reshape(N,1)
    K_approx += feature @ feature.T
Code
fig = plt.figure()
plt.subplot(121)
d = plt.imshow(K_approx, vmin=0, vmax=1)
plt.colorbar(d, shrink=0.3)
plt.title('Truncated approximation (5 terms)')

plt.subplot(122)
e = plt.imshow(K, vmin=0, vmax=1)
plt.colorbar(e, shrink=0.3)
plt.title('Squared exponential')
plt.show()