This attempts to describe kernels. The hope is after going through this, the reader appreciates just how powerful kernels are, and the central role they play in Gaussian process models.
Code
### Data import matplotlib.pyplot as pltimport numpy as npfrom IPython.display import display, HTML
Mercer’s theorem (kernels feature maps)
Now, we shall be interested in mapping from a kernel to a feature map. This leads us to Mercer’s theorem, which states that: A symmetric function can be expressed as the inner product
for some feature map if and only if is positive semidefinite, i.e.,
for all real .
One possible set of features corresponds to eigenfunctions. A function that satisfies the integral equation
is termed an eigenfunction of the kernel . In the expression above, is the corresponding eigenvalue. While the integral above is taken with respect to , more formally, it can be taken with respect to either a density , or the Lebesgue measure over a compact subset of , which reduces to . The eigenfunctions form an orthogonal basis and thus
where is the Kronecker delta. When , its value is ; zero otherwise. Thus, one can define a kernel using its eigenfunctions
Numerical solution
If the covariance matrix is already available, one write its eigendecomposition
where is a matrix of formed by the eigenvectors of and is a diagonal matrix of its eigenvalues, i.e.,
where . This expansion permits one to express each element of as
Beyond numerical solutions, for many kernels there exists analytical solutions for the eigenvalues and eigenvectors. For further details please see page 97 in RW. For now, we simply consider numerical solutions as shown below.