Overview

Course Description:

This graduate-level course offers a practical approach to probabilistic learning with Gaussian processes (GPs). GPs represent a powerful set of methods for modeling and predicting a wide variety of spatio-temporal phenomena. Today, they are used for problems that span both regression and classification, with theoretical foundations in Bayesian inference, reproducing kernel Hilbert spaces, eigenvalue problems, and numerical integration. Rather than focus solely on these theoretical foundations, this course balances theory with practical probabilistic programming, using a variety of python-based packages. Moreover, practical engineering problems will also be discussed that see GP models that cut across other areas of machine learning including transfer learning, convolutional networks, and normalizing flows.

Grading

This course has four assignments; the grades are given below:

Assignment Grade percentage (%)
Assignment 1: Take-home mid-term (covering fundamentals) 20
Assignment 2: Build your own GP from scratch for a given dataset 20
Assignment 3: Proposal 20
Assignment 4: Final project (presentation and notebook) 40

Pre-requisites:

  • CS1371, MATH2551, MATH2552 (or equivalent)
  • Working knowledge of python including familiarity with numpy and matplotlib libraries.
  • Working local version of python and Jupyter.

Lectures

Below you will find a list of the lectures that form the backbone of this course. Sub-topics for each lecture will be updated in due course.

01.08: L1. Introduction & probability fundamentals | Slides | Examples
Contents
  1. Course overview.
  2. Probability fundamentals (and Bayes’ theorem).
  3. Random variables.
01.10: L2. Discrete probability distributions | Slides | Examples | Notebook
Contents
  1. Expectation and variance.
  2. Independence.
  3. Bernoulli and Binomial distributions.

01.15: No Class (Institute Holiday)

01.17: L3. Continuous distributions | Slides | Examples
Contents
  1. Fundamentals of continuous random variables.
  2. Probability density function.
  3. Gaussian and Beta distributions.
01.22: L4. Manipulating and combining distributions | Slides | Examples
Contents
  1. Functions of random variables.
  2. Sums of random variables.

01.24: No Class

01.29: L5. Multivariate Gaussian distributions | Slides
Contents
  1. Marginal distributions.
  2. Conditional distributions.
  3. Joint distribution and Schur complement.
01.31: L6. Linear modelling | Slides
Contents
  1. Least squares.
  2. Regularization.
  3. Gaussian noise model.
02.05: L7. Towards Bayesian Inference | Slides
Contents
  1. Posterior mean and covariance for a linear model.
  2. Fisher information matrix.
  3. Bayesian model introduction.
  4. Posterior definition.
02.07: L8. Bayesian inference in action | Slides
Contents
  1. Analytical calculation of the posterior
  2. Conjugacy in Bayesian inference
  3. A function-space perspective

02.12: Fundamentals Mid-term (take-home)

02.12: L9. An introduction to Gaussian Processes | Slides | Notebook
Contents
  1. Gaussian process prior
  2. Noise-free regression
  3. Kernel functions
  4. Midterm overview
02.14: L10. More on Gaussian Processes and Kernels | Slides | Notebook
Contents
  1. Noisy regression
  2. More about kernels
  3. Kernel trick

02.19: No class

02.21: No class

02.26: L11. More about Kernels | Notebook 1 | Notebook 2 | Notebook 3 |
Contents
  1. Minimum norm problems 2. The case of infinitely many feature vectors 3. Eigenfunction analysis 4. Fourier analysis

02.28: Coding assignment isseued

02.28: L12. Hyperparameters inference Slides
Contents
  1. MAP 2. Marginal likelihood 3. Introduction to gpytorch and pymc
03.04: L13. Markov chain Monte Carlo Notebook
Contents
  1. MAP vs MCMC
  2. Metropolis
  3. Metropolis-Hastings
  4. HMC and NUTS
03.06: L14. Approximate inference | Slides | Notebook
Contents
  1. Review of approximate inference methods.
  2. KL divergence
  3. Variational inference

03.08: L15. A Gaussian Process Case Study | Slides

03.13: Withdrawal Deadline

03.18-03.22: Spring Break

03.25: L16. Scaling Gaussian Processes (Linear Algebra Perspective) | Slides
Contents
  1. Nystrom approximation.
  2. Kronecker product structure.
  3. Toeplitz structure.
03.27: L17. Scaling Gaussian Processes II (Probabilistic Perspective) | Slides | Notebook
Contents
  1. Bayesian inference review.
  2. Deterministic training conditional (DTC).
  3. Fully independent training conditional (FITC).
04.01: L18. Gaussian process classification | Slides
Contents
  1. Classification likelihood.
  2. MAP via Newton Raphson.

04.03: L19. Live coding session: | Code coming up shortly!

Contents
  1. Newton Raphson classification example.
  2. A simple mulit-task model

04.08: L20. Multi-task and Physics-Constrains in Kernels: | Slides | Notebook 1 | Notebook 2

Contents
  1. Model of coregionalization.
  2. Divergence free kernel.
  3. Curl free kernel.
04.10: L21. Time series Gaussian processes | Slides
Contents
  1. Kalman filtering.
  2. Spatio-temporal Gaussian processes.
  3. Equivalence.

04.08: L23. Guest Lecture

04.22: L24. Project presentations

Office hours

Professor Seshadri’s office hours:

Location Time
MK 421 Fridays 14:30 to 15:30

Textbooks

This course will make heavy use of the following texts:

  • Rasmussen, C. E., Williams, C. K. Gaussian Processes for Machine Learning, The MIT Press, 2006.
  • Murphy, K. P., Probabilistic Machine Learning: Advanced Topics, The MIT Press, 2023.

Both these texts have been made freely available by the authors.

Important papers

Students are encouraged to read through the following papers:

References

Material used in this course has been adapted from

  • CUED Part IB probability course notes
  • Alto University’s module on Gaussian Processes
  • Slides from the Gaussian Process Summer Schools