Fourier analysis of kernels

Overview

This note concerns the Fourier dual of a kernel, i.e., one can think of a kernel has having an associated set of frequencies and amplitudes, much like studying the spectral density or power spectrum of a signal.

Code
import plotly.graph_objects as go
import plotly.figure_factory as ff
import plotly.express as px
import pandas as pd
import plotly.io as pio
import numpy as np
from scipy.stats import multivariate_normal
import matplotlib.pyplot as plt
pio.renderers.default = 'iframe'
from IPython.display import display, HTML

Bochner’s theorem

A stationary function \(k\left( \mathbf{x}, \mathbf{x}' \right)=k \left( \mathbf{x} - \mathbf{x}' \right) = k \left( \boldsymbol{\tau} \right)\) can be represented as the Fourier transform of a positive finite measure. The formal statement is given by

A complex-valued function \(k\) on \(\mathcal{X}\) is the covariance function of a weakly stationary mean square continuous complex-valued random process on \(\mathcal{X}\) if and only if it can be represented as

\[ k \left( \boldsymbol{\tau} \right) = \int_{\mathcal{X}} exp \left( 2 \pi i \boldsymbol{\omega} \cdot \boldsymbol{\tau} \right) d \mu \left( \boldsymbol{\omega} \right) \]

where \(\mu\) is a positive finite measure, and \(\boldsymbol{\omega}\) are the frequencies. If \(\mu\) has a density \(S \left( \boldsymbol{\omega} \right)\), then \(S\) is the spectral density or power spectrum associated with the kernel \(k\).

Wiener-Khintchine theorem

A direct consequence of Bochner’s theoreom is the Wiener-Khintchine theorem. If the spectral density \(S \left( \boldsymbol{\omega} \right)\) exists, the spectral density and the covariance function are said to be Fourier duals. This leads to the following statement:

\[ k \left( \boldsymbol{\tau} \right) = \int S \left( \boldsymbol{\omega} \right) exp \left( 2 \pi i \boldsymbol{\omega} \cdot \boldsymbol{\tau} \right) d \boldsymbol{\omega}, \; \; \; \; S \left( \boldsymbol{\omega}\right) = \int k \left( \boldsymbol{\tau} \right) exp\left(- 2 \pi i \boldsymbol{\omega} \cdot \boldsymbol{\tau} \right) d \boldsymbol{\tau} \]

As noted in RW, \(S \left( \boldsymbol{\omega} \right)\) is essentially the amount of power assigned to the eigenfunction \(exp \left( 2 \pi i \boldsymbol{\omega} \cdot \mathbf{\tau} \right)\) with frequency \(\boldsymbol{\omega}\). The amplitude as a function of frequency \(S\left( \boldsymbol{\omega} \right)\) must decay sufficiently fast so that the terms above are integrable.

There are some important points to note:

  1. If we have a stationary kernel, we can resolve what frequencies underscore the model by working out its Fourier transform.
  2. On the other hand, if we have a certain spectral density of interest, then its inverse Fourier transform is a kernel.

To analytically work this out, it may be useful to go through an example (courtsey of Markus Heinonen). The derivation below will require three pieces: - We shall assume a symmetric frequency distribution, i.e., \(S\left( \boldsymbol{\omega} \right) = S \left( -\boldsymbol{\omega} \right)\). - From Euler’s formula we have \(cos\left(x\right) \pm i sin\left(x \right) = exp \left(\pm ix \right)\) - The negative sine identity, i.e., \(sin \left( -x \right) = - sin \left( x \right)\)

Starting with the expression above, we begin wtih

\[ \begin{aligned} k \left( \boldsymbol{\tau} \right) & = \int_{-\infty}^{\infty} S \left( \boldsymbol{\omega} \right) exp \left( 2 \pi i \boldsymbol{\omega} \cdot \boldsymbol{\tau} \right) d \boldsymbol{\omega} \\ & = \int_{-\infty}^{\infty} S \left(\boldsymbol{\omega} \right) cos \left( 2 \pi\boldsymbol{\tau} \cdot \boldsymbol{\omega} \right) d \boldsymbol{\omega} + \int_{-\infty}^{\infty} iS \left(\boldsymbol{\omega} \right) sin \left( 2 \pi \boldsymbol{\tau} \cdot \boldsymbol{\omega} \right) d \boldsymbol{\omega} \\ & = \mathbb{E}\left[ S \left(\omega \right) \right] cos \left( 2 \pi \boldsymbol{\tau} \cdot \boldsymbol{\omega} \right) + \int_{-\infty}^{0} iS \left(\boldsymbol{\omega} \right) sin \left( 2 \pi \boldsymbol{\tau} \cdot \boldsymbol{\omega} \right) d \boldsymbol{\omega} + \int_{0}^{\infty} iS \left(\boldsymbol{\omega} \right) sin \left( 2 \pi \boldsymbol{\tau} \cdot \boldsymbol{\omega} \right) d \boldsymbol{\omega} \\ & = \mathbb{E}\left[ S \left(\omega \right) \right] cos \left( 2 \pi \boldsymbol{\tau} \cdot \boldsymbol{\omega} \right) + \int_{0}^{\infty} iS \left(-\boldsymbol{\omega} \right) sin \left( -2 \pi \boldsymbol{\tau} \cdot \boldsymbol{\omega} \right) d \boldsymbol{\omega} + \int_{0}^{\infty} iS \left(\boldsymbol{\omega} \right) sin \left( 2 \pi \boldsymbol{\tau} \cdot \boldsymbol{\omega} \right) d \boldsymbol{\omega} \\ & = \mathbb{E}\left[ S \left(\omega \right) \right] cos \left( 2 \pi \boldsymbol{\tau} \cdot \boldsymbol{\omega} \right) + \int_{0}^{\infty} -iS \left(\boldsymbol{\omega} \right) sin \left( 2 \pi \boldsymbol{\tau} \cdot \boldsymbol{\omega} \right) d \boldsymbol{\omega} + \int_{0}^{\infty} iS \left(\boldsymbol{\omega} \right) sin \left( 2 \pi \boldsymbol{\tau} \cdot \boldsymbol{\omega} \right) d \boldsymbol{\omega} \\ \end{aligned} \]

This leads to

\[ \begin{aligned} k \left( \boldsymbol{\tau} \right) & = \mathbb{E}\left[ S \left(\omega \right) \right] cos \left( 2 \pi \boldsymbol{\tau} \cdot \boldsymbol{\omega} \right) \end{aligned} \]

This demonstrates that all real-valued stationary kernels are \(S\left( \boldsymbol{\omega} \right)\)-weighted combinations of cosine terms.

Kernel harmonic representation

Our new general stationary kernel definition is thus:

\[ k \left( \boldsymbol{\tau} \right) = \mathbb{E}\left[ S \left(\omega \right) \right] cos \left( 2 \pi \boldsymbol{\tau} \cdot \boldsymbol{\omega} \right) \]

where the frequencies \(\boldsymbol{\omega}\) are an inverse of the period \(1/\boldsymbol{\omega}\). Bracewell provides the following expressions for the Wiener-Khintchine result, by integrating out the angular variables (see page 83 of RW):

\[ \begin{aligned} k \left( \boldsymbol{\tau} \right) & = \frac{2 \pi}{\boldsymbol{\tau}^{-1/2}} \int_{0}^{\infty} S \left( \boldsymbol{\omega} \right) J_{-1/2} \left(2 \pi \boldsymbol{\tau} \cdot \boldsymbol{\omega} \right) \boldsymbol{\omega}^{1/2} d \boldsymbol{\omega} \\ S \left( \boldsymbol{\omega} \right) & = \frac{2 \pi}{\boldsymbol{\omega}^{-1/2}} \int_{0}^{\infty} k \left( \boldsymbol{\tau} \right) J_{-1/2} \left(2 \pi \boldsymbol{\tau} \cdot \boldsymbol{\omega} \right) \boldsymbol{\tau}^{1/2} d \boldsymbol{\tau} \end{aligned} \]

Note that in RW, the authors use \(D\) to denote the dimensionality, which we have assumed to be 1. The function \(J_{-1/2}\) is the Bessel function of order \(-1/2\). While the expressions above may seem unwiedly, we can work out what these are using a bit of Sympy. Consider the case of a squared exponential kernel of the form

\[ k \left(\boldsymbol{\tau} \right) = exp \left(- \frac{\boldsymbol{\tau}^2}{2l^2} \right). \]

Code
from sympy import * 

omega = Symbol("omega")
ell = Symbol("l")
tau = Symbol("tau")

kernel = exp(- tau**2 / (2 * ell**2))
integrate(2*pi*omega**(1/2) * kernel * besselj(-1/2, 2*pi*tau*omega)*tau**(1/2), (tau, 0, oo))

\(\displaystyle \begin{cases} 1.4142135623731 \pi^{0.5} l^{1.0} e^{- 2 \pi^{2} l^{2} \omega^{2}} & \text{for}\: \left(\left|{\arg{\left(\omega \right)}}\right| = 0 \wedge \left|{\arg{\left(l \right)}}\right| < \frac{\pi}{4}\right) \vee \left|{\arg{\left(l \right)}}\right| < \frac{\pi}{4} \\\int\limits_{0}^{\infty} 2 \pi \omega^{0.5} \tau^{0.5} e^{- \frac{\tau^{2}}{2 l^{2}}} J_{-0.5}\left(2 \pi \omega \tau\right)\, d\tau & \text{otherwise} \end{cases}\)

The first expression above is the Fourier amplitude of the squared exponential kernel, i.e.,

\[ S \left(\boldsymbol{\omega} \right) = \left( 2 \pi l^2\right)^{1/2} exp \left( - 2 \pi l^2 \boldsymbol{\omega}^2 \right) \]

Code
omega = np.linspace(0, np.pi/4, 50)
l = 0.5
S_omega = (2 * np.pi * l**2)**(1/2) * \
            np.exp(- 2 * np.pi * l**2 * omega**2)
tau = np.linspace(0, 10, 200)


fig = go.Figure()
fig.add_scatter(x=omega, y=S_omega, mode='lines')
fig.update_layout(title='Spectral density', \
                  xaxis_title=r'Frequency, $\omega$',\
                  yaxis_title=r'Spectral density, $S\left( \omega \right) $')
fig.show()
Code
kernel = tau * 0.
true_kernel = np.exp(-tau**2 / l**2)
counter = 0.

fig = go.Figure()
for omega_j in omega:
    counter += 1.
    label=str(np.around(int(counter), 1))+' terms'
    S_omega_j = (2 * np.pi * l**2)**(1/2) * \
            np.exp(- 2 * np.pi * l**2 * omega_j**2)
    cos_term = np.cos(2 * np.pi * tau * omega_j)
    kernel += (S_omega_j * cos_term)
    fig.add_scatter(x=tau, y=kernel * 1/counter, name=label, mode='lines')
fig.add_scatter(x=tau, y=true_kernel, name='Kernel', mode='lines', \
                line=dict(width=4, color='black'))
fig.update_layout(title='Sq. exp kernel Fourier representation', \
                  xaxis_title=r'Distance, $\tau$',\
                  yaxis_title=r'$k ( \tau )$')
fig.show()

Notice that the more terms we incorporate, the closer we converge to the true kernel.

Random Fourier Features

Rather than negotiate a kernel approximation with a great many number of terms, it will be more instructive to resort to a few terms. Such is the idea behind Random Fourier Features, where one selects a kernel comprised of random frequencies. For more details, please see the paper by Rahimi and Recht.

Code
R = 500 # random features
D = 50 # number of data pts.
x = np.linspace(-2*np.pi, 2*np.pi, D).reshape(D,1) # grid
X = np.tile(x, [1, D]) - np.tile(x.T, [D, 1])
W    = np.random.normal(loc=0, scale=0.1, size=(R, D))
b    = np.random.uniform(0, 2*np.pi, size=R)
B    = np.repeat(b[:, np.newaxis], D, axis=1)
norm = 1./ np.sqrt(R)
Z    = norm * np.sqrt(2) * np.cos(W @ X.T + B)
ZZ   = Z.T @ Z
Code
fig = plt.figure(figsize=(14,5))
plt.subplot(121)
d = plt.imshow(ZZ)
plt.colorbar(d, shrink=0.3)
plt.title('Random Fourier Features')
normal = multivariate_normal(np.zeros((D)), ZZ, allow_singular=True)
plt.subplot(122)
plt.plot(x, normal.rvs(10).T )
plt.title('Random samples from prior')
plt.xlabel('x')
plt.show()