# Reproducing Kernel Hilbert Spaces

Monday, October 25, 2021

## Logistics

• Drop date: October 30, 2021

• My office hours tomorrow

• Tuesdays 8am-9am on BlueJeans (https://bluejeans.com/205357142)
• Come prepared!
• Midterm 2:

• Moved to Monday November 8, 2021 (gives you weekend to prepare)
• Coverage: everything since Midterm 1 (dont’ forget the fundamentals though), emphasis on regression

## What’s on the agenda for today?

• Last time:
• Functional on Hilbert spaces
• Today:
• Reproducing Kernel Hilbert Spaces
• Reading: Romberg, lecture notes 10/11

## Last time: Riesz representation theorem

• Let $F:\calF\to\bbR$ be a continuous linear functional on a (possible infinite dimensional) separable Hilbert space $\calF$.

Then there exists $c\in\calF$ such that $F(x)=\dotp{x}{c}$ for every $x\in\calF$

• If $\set{\psi_n}_{n\geq 1}$ is an orthobasis for $\calH$, then we can construct $c$ above as $c\eqdef \sum_{n=1}^\infty F(\psi_n)\psi_n$

## Reproducing Kernel Hilbert Spaces

• An RKHS is a Hilbert space $\calH$ of real-valued functions $f:\bbR^d\to\bbR$ in which the sampling operation $\calS_\bftau:\calH\to\bbR:f\mapsto f(\bftau)$ is continuous for every $\bftau\in\bbR^d$.

In other words, for each $\bftau\in\bbR^d$, there exists $k_\bftau\in\calH$ s.t. $f(\bftau) = {\dotp{f}{k_\bftau}}_\calH\text{ for all } f\in\calH$

• The kernel of an RKHS is $k:\bbR^d\times\bbR^d\to\bbR:(\bft,\bftau)\mapsto k_{\bftau}(\bft)$ where $k_\bftau$ is the element of $\calH$ that defines the sampling at $\bftau$.

• A (separable) Hilbert space with orthobasis $\set{\psi_n}_{n\geq 1}$ is an RKHS with kernel $k(\bft,\bftau)=\sum_{n=1}^\infty\psi_n(\bftau)\psi_n(\bft)$ iff $\forall \bftau\in\bbR^d$ $\sum_{n=1}^\infty\abs{\psi_{n}(\tau)}^2<\infty$

## RKHS and non orthogonal basis

• If $\set{\phi_n}_{n\geq 1}$ is a Riesz basis for $\calH$, we know that every $x\in\calH$ can be written $x = \sum_{n\geq 1}\alpha_n\phi_n\textsf{ with } \alpha_n\eqdef\dotp{x}{\smash{\widetilde{\phi}_n}}$ where $\set{\widetilde{\phi}_n}_{n\geq 1}$ is the dual basis.

• A (separable) Hilbert space with Riesz basis $\set{\phi_n}_{n\geq 1}$ is an RKHS with kernel $k(\bft,\bftau) =\sum_{n=1}^\infty \phi_n(\bftau)\widetilde{\phi}_n(\bft)$ iff $\forall \bftau\in\bbR^d$ $\sum_{n=1}^\infty\abs{\phi_{n}(\tau)}^2<\infty$

## Examples

• Finite dimensional Hilbert space

• Space of $L$th order polynomial splines on the real line

• Remark

• RKHS are more easily characterized by their kernel
• Often, we try to avoid an explicit description of the the elements in the space

## Kernel regression

• Regression problem: given $n$ pairs $(\bfx_i,y_i)\in\bbR^d\times\bbR$, solve $\min_{f\in\calF}\sum_{i=1}^n\abs{y_i-f(\bfx_i)}^2+\lambda\norm[\calF]{f}^2$

• If we restrict $\calF$ to be an RKHS, the problem becomes $\min_{f\in\calF}\sum_{i=1}^n\abs{y_i-{\dotp{f}{x_i}}_{\calF}}^2+\lambda\norm[\calF]{f}^2$

where $x_i\eqdef k_{\bfx_i}$ provides the mapping between $\bbR^d$ and $\calF$ $x_i:\bfR^d\to\bbR:\bft\mapsto k_{\bfx_i}(\bft) = k(\bfx_i,\bft)$

• The solution is given by $\widehat{f} = \sum_{i=1}^n \widehat{\alpha}_i x_i\textsf{ with }\widehat{\bfalpha}\eqdef (\bfK+\lambda\bfI)^{-1}\bfy$ and $\bfK\eqdef[K_{i,j}]_{1\leq i,j\leq n}$ with $K_{i,j}=\dotp{x_i}{x_j}$

## Kernel regression

• Kernel magic
1. $K_{ij} = \dotp{x_i}{x_j}=\dotp{k_{\bfx_i}}{k_{\bfx_j}} = k_{\bfx_i}(\bfx_j) = k(\bfx_i,\bfx_j)$
2. $\widehat{f}(\bfx) = \dotp{\widehat{f}}{k_{\bfx}} = \sum_{i=1}^n\widehat{\alpha_i}k(\bfx_i,\bfx)$
• Remarks
• We solved an infinite dimensional problem using an $n\times n$ system of equations and linear algebra
• Our solution and the evaluation only depend on the kernel; we never need to work directly in $\calF$
• Question: can we skip $\calF$ entirely? how do we find “good” kernels?

## Aronszjan’s theorem

• An inner product kernel is a mapping $k:\bbR^d\times\bbR^d\to\bbR$ for which there exists a Hilbert space $\calH$ and a mapping $\Phi:\bbR^d\to\calH$ such that $\forall \bfu,\bfv\in\bbR^d\quad k(\bfu,\bfv)=\langle\Phi(\bfu),\Phi(\bfv)\rangle_\calH$

• A function $k:\bbR^d\times\bbR^d\to\bbR$ is a positive semidefinite kernel if
• $k$ is symmetric, i.e., $k(\bfu,\bfv)=k(\bfv,\bfu)$
• for all $\{\bfx_i\}_{i=1}^N$, the Gram matrix $\bfK$ is positive semidefinite, i.e., $\bfx^\intercal\bfK\bfx\geq 0\text{ with }\bfK=[K_{i,j}]\text{ and }K_{i,j}\eqdef k(\bfx_i,\bfx_j)$
• A function $k:\bbR^d\times\bbR^d\to\bbR$ is an inner product kernel if and only if $k$ is a positive semidefinite kernel.

## Examples

• Regression using linear and quadratic functions in $\bbR^d$
• Regression using Radial Basis Functions
• Examples of kernels
• Homogeneous polynomial kernel: $k(\bfu,\bfv) = (\bfu^\intercal\bfv)^m$ with $m\in\bbN^*$
• Inhomogenous polynomial kernel: $k(\bfu,\bfv) = (\bfu^\intercal\bfv+c)^m$ with $c>0$, $m\in\bbN^*$
• Radial basis function (RBF) kernel: $k(\bfu,\bfv) = \exp\left(-\frac{\norm{\bfu-\bfv}^2}{2\sigma^2}\right)$ with $\sigma^2>0$