Gaussian Processes

Matthieu Bloch

Tuesday November 29, 2022

Today in ECE 6555

Announcements
- 3 lectures left (including today)
- One more homework (optional, I didn't realize how close the end of the semester was…)
- One final exam take-home
Last time
- Representer theorem
Today
- Finding RKHS
Questions?

Let \(F:\calF\to\bbR\) be a linear functional on an \(n\)-dimensional Hilbert space \(\calF\).

Then there exists \(c\in\calF\) such that \(F(x)=\dotp{x}{c}\) for every \(x\in\calF\)
Linear functional over finite dimensional Hilbert spaces are continuous!
This is not true in infinite dimension
Let \(F:\calF\to\bbR\) be a continuous linear functional on a (possible infinite dimensional) separable Hilbert space \(\calF\).

Then there exists \(c\in\calF\) such that \(F(x)=\dotp{x}{c}\) for every \(x\in\calF\)
If \(\set{\psi_n}_{n\geq 1}\) is an orthobasis for \(\calF\), then we can construct \(c\) above as \[ c\eqdef \sum_{n=1}^\infty F(\psi_n)\psi_n \]

An RKHS is a Hilbert space \(\calH\) of real-valued functions \(f:\bbR^d\to\bbR\) in which the sampling operation \(\calS_\bftau:\calH\to\bbR:f\mapsto f(\bftau)\) is continuous for every \(\bftau\in\bbR^d\).

In other words, for each \(\bftau\in\bbR^d\), there exists \(k_\bftau\in\calH\) s.t. \[ f(\bftau) = {\dotp{f}{k_\bftau}}_\calH\text{ for all } f\in\calH \]
The kernel of an RKHS is \[ k:\bbR^d\times\bbR^d\to\bbR:(\bft,\bftau)\mapsto k_{\bftau}(\bft) \] where \(k_\bftau\) is the element of \(\calH\) that defines the sampling at \(\bftau\).
A (separable) Hilbert space with orthobasis \(\set{\psi_n}_{n\geq 1}\) is an RKHS iff \(\forall \bftau\in\bbR^d\) \(\sum_{n=1}^\infty\abs{\psi_{n}(\tau)}^2<\infty\)

If \(\calH\) is an RKHS, then \[ \min_{f\in\calF}\sum_{i=1}^n\abs{y_i-f(\vecx_i)}^2+\lambda\norm[\calH]{f} \] has solution \[ f = \sum_{i=1}^n\alpha_i k_{\vecx_i}\textsf{ with } \bfalpha = (\matK+\lambda\matI)^{-1}\vecy\qquad \matK=\mat{c}{k(\vecx_i,\vecx_j)}_{1\leq i,j\leq n} \]

Kernel magic
1. \(K_{ij} = \dotp{x_i}{x_j}=\dotp{k_{\bfx_i}}{k_{\bfx_j}} = k_{\bfx_i}(\bfx_j) = k(\bfx_i,\bfx_j)\)
2. \(\widehat{f}(\bfx) = \dotp{\widehat{f}}{k_{\bfx}} = \sum_{i=1}^n\widehat{\alpha_i}k(\bfx_i,\bfx)\)
Remarks
- We solved an infinite dimensional problem using an \(n\times n\) system of equations and linear algebra
- Our solution and the evaluation only depend on the kernel; we never need to work directly in \(\calF\)
Question: can we skip \(\calF\) entirely? how do we find "good" kernels?

An inner product kernel is a mapping \(k:\bbR^d\times\bbR^d\to\bbR\) for which there exists a Hilbert space \(\calH\) and a mapping \(\Phi:\bbR^d\to\calH\) such that \[\forall \bfu,\bfv\in\bbR^d\quad k(\bfu,\bfv)=\langle\Phi(\bfu),\Phi(\bfv)\rangle_\calH\]
A function \(k:\bbR^d\times\bbR^d\to\bbR\) is a positive semidefinite kernel if
- \(k\) is symmetric, i.e., \(k(\bfu,\bfv)=k(\bfv,\bfu)\)
- for all \(\{\bfx_i\}_{i=1}^N\), the Gram matrix \(\bfK\) is positive semidefinite, i.e., \[\bfx^\intercal\bfK\bfx\geq 0\text{ with }\bfK=[K_{i,j}]\text{ and }K_{i,j}\eqdef k(\bfx_i,\bfx_j)\]
A function \(k:\bbR^d\times\bbR^d\to\bbR\) is an inner product kernel if and only if \(k\) is a positive semidefinite kernel.

Examples
- Regression using linear and quadratic functions in \(\bbR^d\)
- Regression using Radial Basis Functions
Examples of kernels
- Homogeneous polynomial kernel: \(k(\bfu,\bfv) = (\bfu^\intercal\bfv)^m\) with \(m\in\bbN^*\)
- Inhomogenous polynomial kernel: \(k(\bfu,\bfv) = (\bfu^\intercal\bfv+c)^m\) with \(c>0\), \(m\in\bbN^*\)
- Radial basis function (RBF) kernel: \(k(\bfu,\bfv) = \exp\left(-\frac{\norm{\bfu-\bfv}^2}{2\sigma^2}\right)\) with \(\sigma^2>0\)