Matthieu Bloch
Tuesday, November 15, 2022
Gaussian processes are a powerful tool to model unknown functions and learn them from samples
A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution
A Gaussian process is completely specific by a mean function \(m(\vecx)\) and a covariance function \(k(\vecx,\vecx')\) of a real-valued process \(f(\vecx)\) such that \[ m(\vecx)\eqdef \E{f(\vecx)}\qquad k(\vecx,\vecx') = \E{(f(\vecx-m(\vecx)))(f(\vecx')-m(\vecx'))} \]
Possible to generalize to vector-valued functions (more on this later), often assume \(m(\vecx)=0\)
Suppose that \[ \matM = \left[\begin{array}{cc}\matM_{11}&\matM_{12}\\\matM_{21}&\matM_{22}\end{array}\right]. \] If \(\matM_{22}\) invertible, the Schur complement of \(\matM\) in \(\matM_{22}\) is \(\matS_{22} \eqdef \matM_{11}-\matM_{12}\matM_{22}^{-1}\matM_{21}\).
If \(\matM_{11}\) invertible, the Schur complement of \(\matM\) in \(\matM_{11}\) is \(\matS_{11} \eqdef \matM_{22}-\matM_{21}\matM_{11}^{-1}\matM_{12}\).
\[ \matM^{-1} = \left[\begin{array}{cc}\matS_{22}^{-1}&-\matS_{22}^{-1}\matM_{12}\matM_{22}^{-1}\\-\matM_{22}^{-1}\matM_{21}\matS_{22}^{-1}&\matM_{22}^{-1}+\matM_{22}^{-1}\matM_{21}\matS_{22}^{-1}\matM_{12}\matM_{22}^{-1}\end{array}\right]. \] \[ \matM^{-1} = \left[\begin{array}{cc}\matM_{11}^{-1}+\matM_{11}^{-1}\matM_{12}\matS_{11}^{-1}\matM_{21}\matM_{11}^{-1}&-\matM_{11}^{-1}\matM_{12}\matS_{11}^{-1}\\-\matS_{11}^{-1}\matM_{21}\matM_{11}^{-1}&\matS_{11}^{-1}\end{array}\right]. \]
The distribution of \(\vecf^*\) conditioned on \(X\), \(X^*\) and \(\vecf\) is \[ \calN\left(\matK(X^*,X)\matK(X,X)^{-1}\vecf,\matK(X^*,X^*)-\matK(X^*,X)\matK(X,X)^{-1}\matK(X,X^*)\right) \] Hence we can estimate \(\vecf^*\) as \(\matK(X^*,X)\matK(X,X)^{-1}\vecf\) and we can quantify our uncertainty.
Often, we only observe \(y = f(x)+{\varepsilon}\) with \({\varepsilon}\sim\calN(0,\sigma^2)\) (i.i.d. across different measurements)
\[ \left[\begin{array}{c}\vecy\\\vecf^*\end{array}\right]\sim\calN\left(\boldsymbol{0},\left[\begin{array}{cc}\matK(X,X)+\sigma^2\matI&\matK(X,X^*)\\\matK(X^*,X)&\matK(X^*X^*)\end{array}\right]\right) \]
The distribution of \(\vecf^*\) conditioned on \(X\), \(X^*\) and \(\vecy\) is \[ \calN\left(\matK(X^*,X)(\matK(X,X)+\sigma^2\matI)^{-1}\vecy,\matK(X^*,X^*)-\matK(X^*,X)(\matK(X,X)+\sigma^2\matI)^{-1}\matK(X,X^*)\right) \] Hence we can estimate \(\vecf^*\) as \(\matK(X^*,X)(\matK(X,X)+\sigma^2\matI)^{-1}\vecy\) and we can quantify our uncertainty.
Small simplification: single test point \(x^*\) \[ f^* = \veck_*^T(\matK + \sigma^2\matI)^{-1}\vecy\qquad \sigma_{f^*} = k(x^*,x^*) - \veck_*^T(\matK+\sigma^2\matI)^{-1}\veck_* \]
A subset \(\calW\) of a vector space \(\calV\) is a vector subspace if \(\forall x,y\in\calW\forall \lambda,\mu\in\bbK\) \(\lambda x+\mu y \in\calW\)
If \(\calW\) is a vector subspace of a vector space \(\calV\), \(\calW\) is a vector space.
The properties of vector space seen thus far provide an algebraic structure
We are missing a topological structure to measure length and distance
\(\norm{x}\) measures a length, \(\norm{x-y}\) measures a distance
In addition to a topological and algebraic strucure, what if we want to do geometry?
An inner product space over \(\bbR\) is a vector space \(\calV\) equipped with a positive definite symmetric bilinear form \(\dotp{\cdot}{\cdot}:\calV\times\calV\to\bbR\) called an inner product
An inner product space is also called a pre-Hilbert space