Reproducing Kernel Hilbert Spaces

Dr. Matthieu R Bloch

Monday, October 25, 2021

Logistics

  • Drop date: October 30, 2021

  • My office hours tomorrow

    • Tuesdays 8am-9am on BlueJeans (https://bluejeans.com/205357142)
    • Come prepared!
  • Midterm 2:

    • Moved to Monday November 8, 2021 (gives you weekend to prepare)
    • Coverage: everything since Midterm 1 (dont’ forget the fundamentals though), emphasis on regression

What’s on the agenda for today?

  • Last time:
    • Functional on Hilbert spaces
  • Today:
    • Reproducing Kernel Hilbert Spaces
  • Reading: Romberg, lecture notes 10/11

Last time: Riesz representation theorem

  • Let \(F:\calF\to\bbR\) be a continuous linear functional on a (possible infinite dimensional) separable Hilbert space \(\calF\).

    Then there exists \(c\in\calF\) such that \(F(x)=\dotp{x}{c}\) for every \(x\in\calF\)


  • If \(\set{\psi_n}_{n\geq 1}\) is an orthobasis for \(\calH\), then we can construct \(c\) above as \[ c\eqdef \sum_{n=1}^\infty F(\psi_n)\psi_n \]

Reproducing Kernel Hilbert Spaces

  • An RKHS is a Hilbert space \(\calH\) of real-valued functions \(f:\bbR^d\to\bbR\) in which the sampling operation \(\calS_\bftau:\calH\to\bbR:f\mapsto f(\bftau)\) is continuous for every \(\bftau\in\bbR^d\).

    In other words, for each \(\bftau\in\bbR^d\), there exists \(k_\bftau\in\calH\) s.t. \[ f(\bftau) = {\dotp{f}{k_\bftau}}_\calH\text{ for all } f\in\calH \]

  • The kernel of an RKHS is \[ k:\bbR^d\times\bbR^d\to\bbR:(\bft,\bftau)\mapsto k_{\bftau}(\bft) \] where \(k_\bftau\) is the element of \(\calH\) that defines the sampling at \(\bftau\).

  • A (separable) Hilbert space with orthobasis \(\set{\psi_n}_{n\geq 1}\) is an RKHS with kernel \(k(\bft,\bftau)=\sum_{n=1}^\infty\psi_n(\bftau)\psi_n(\bft)\) iff \(\forall \bftau\in\bbR^d\) \(\sum_{n=1}^\infty\abs{\psi_{n}(\tau)}^2<\infty\)

RKHS and non orthogonal basis

  • If \(\set{\phi_n}_{n\geq 1}\) is a Riesz basis for \(\calH\), we know that every \(x\in\calH\) can be written \[ x = \sum_{n\geq 1}\alpha_n\phi_n\textsf{ with } \alpha_n\eqdef\dotp{x}{\smash{\widetilde{\phi}_n}} \] where \(\set{\widetilde{\phi}_n}_{n\geq 1}\) is the dual basis.

  • A (separable) Hilbert space with Riesz basis \(\set{\phi_n}_{n\geq 1}\) is an RKHS with kernel \[ k(\bft,\bftau) =\sum_{n=1}^\infty \phi_n(\bftau)\widetilde{\phi}_n(\bft) \] iff \(\forall \bftau\in\bbR^d\) \(\sum_{n=1}^\infty\abs{\phi_{n}(\tau)}^2<\infty\)

Examples

  • Finite dimensional Hilbert space

  • Space of \(L\)th order polynomial splines on the real line

  • Remark

    • RKHS are more easily characterized by their kernel
    • Often, we try to avoid an explicit description of the the elements in the space

Kernel regression

  • Regression problem: given \(n\) pairs \((\bfx_i,y_i)\in\bbR^d\times\bbR\), solve \[ \min_{f\in\calF}\sum_{i=1}^n\abs{y_i-f(\bfx_i)}^2+\lambda\norm[\calF]{f}^2 \]

  • If we restrict \(\calF\) to be an RKHS, the problem becomes \[ \min_{f\in\calF}\sum_{i=1}^n\abs{y_i-{\dotp{f}{x_i}}_{\calF}}^2+\lambda\norm[\calF]{f}^2 \]

    where \(x_i\eqdef k_{\bfx_i}\) provides the mapping between \(\bbR^d\) and \(\calF\) \[ x_i:\bfR^d\to\bbR:\bft\mapsto k_{\bfx_i}(\bft) = k(\bfx_i,\bft) \]

  • The solution is given by \[ \widehat{f} = \sum_{i=1}^n \widehat{\alpha}_i x_i\textsf{ with }\widehat{\bfalpha}\eqdef (\bfK+\lambda\bfI)^{-1}\bfy \] and \(\bfK\eqdef[K_{i,j}]_{1\leq i,j\leq n}\) with \(K_{i,j}=\dotp{x_i}{x_j}\)

Kernel regression

  • Kernel magic
    1. \(K_{ij} = \dotp{x_i}{x_j}=\dotp{k_{\bfx_i}}{k_{\bfx_j}} = k_{\bfx_i}(\bfx_j) = k(\bfx_i,\bfx_j)\)
    2. \(\widehat{f}(\bfx) = \dotp{\widehat{f}}{k_{\bfx}} = \sum_{i=1}^n\widehat{\alpha_i}k(\bfx_i,\bfx)\)
  • Remarks
    • We solved an infinite dimensional problem using an \(n\times n\) system of equations and linear algebra
    • Our solution and the evaluation only depend on the kernel; we never need to work directly in \(\calF\)
  • Question: can we skip \(\calF\) entirely? how do we find “good” kernels?

Aronszjan’s theorem

  • An inner product kernel is a mapping \(k:\bbR^d\times\bbR^d\to\bbR\) for which there exists a Hilbert space \(\calH\) and a mapping \(\Phi:\bbR^d\to\calH\) such that \[\forall \bfu,\bfv\in\bbR^d\quad k(\bfu,\bfv)=\langle\Phi(\bfu),\Phi(\bfv)\rangle_\calH\]

  • A function \(k:\bbR^d\times\bbR^d\to\bbR\) is a positive semidefinite kernel if
    • \(k\) is symmetric, i.e., \(k(\bfu,\bfv)=k(\bfv,\bfu)\)
    • for all \(\{\bfx_i\}_{i=1}^N\), the Gram matrix \(\bfK\) is positive semidefinite, i.e., \[\bfx^\intercal\bfK\bfx\geq 0\text{ with }\bfK=[K_{i,j}]\text{ and }K_{i,j}\eqdef k(\bfx_i,\bfx_j)\]
  • A function \(k:\bbR^d\times\bbR^d\to\bbR\) is an inner product kernel if and only if \(k\) is a positive semidefinite kernel.

Examples

  • Regression using linear and quadratic functions in \(\bbR^d\)
  • Regression using Radial Basis Functions
  • Examples of kernels
    • Homogeneous polynomial kernel: \(k(\bfu,\bfv) = (\bfu^\intercal\bfv)^m\) with \(m\in\bbN^*\)
    • Inhomogenous polynomial kernel: \(k(\bfu,\bfv) = (\bfu^\intercal\bfv+c)^m\) with \(c>0\), \(m\in\bbN^*\)
    • Radial basis function (RBF) kernel: \(k(\bfu,\bfv) = \exp\left(-\frac{\norm{\bfu-\bfv}^2}{2\sigma^2}\right)\) with \(\sigma^2>0\)