Matthieu R Bloch
\(K:\calX\times\calX\to\bbR\) is an inner product kernel if and only if \(K\) is a positive definite kernel.
Useful to implicitly define a transformation of feature vectors \(\bfx\in\bbR^d\) to \(\Phi(\bfx)\in\calH\)
Particularly useful when a machine learning task can be kernelized (SVM, etc.)
An inner product space over \(\bbR\) is a vector space \(\calV\) equipped with a positive definite symmetric bilinear form \(\dotp{\cdot}{\cdot}:\calV\times\calV\to\bbR\) called an inner product
An inner product space is also called a pre-Hilbert space
An inner product satisfies \(\forall x,y\in\calV\) \(\dotp{x}{y}^2\leq\dotp{x}{x}\dotp{y}{y}\)
\[\bfx\in\bbR^d\qquad\norm[0]{\bfx}\eqdef\card{\set{i:x_i\neq 0}}\quad\norm[1]{\bfx}\eqdef\sum_{i=1}^d\abs{x_i}\quad \norm[2]{\bfx}\eqdef\sqrt{\sum_{i=1}^d x_i^2}\]
In an inner product space, an inner product induces a norm \(\norm{x} \eqdef \sqrt{\dotp{x}{x}}\)
A norm \(\norm{\cdot}\) is induced by an inner product on \(\calV\) iff \(\forall x,y\in\calV\) \(\norm{x}^2+\norm{y}^2 = \frac{1}{2}\left(\norm{x+y}^2+\norm{x-y}^2\right)\)
If this is the case, the inner product is given by the polarization identity \[\dotp{x}{y}=\frac{1}{2}\left(\norm{x}^2+\norm{y}^2-\norm{x-y}^2\right)\]
Two vectors \(x,y\in\calV\) are orthogonal if \(\dotp{x}{y}=0\). We write \(x\perp y\) for simplicity.
A vector \(x\in\calV\) is orthogonal to a set \(\calS\subset\calV\) if \(\forall s\in\calS\) \(\dotp{x}{s}=0\). We write \(x\perp \calS\) for simplicity.
If \(x\perp y\) then \(\norm{x+y}^2=\norm{x}^2+\norm{y}^2\)
A Cauchy sequence in \(\calV\) is a sequence \(\set{x_n}_{n\geq 0}\) if \(\forall\epsilon>0\) \(\exists N\in\bbN^*\) such that \(\forall n,m>N\) \(\norm{x_n-x_m}\leq \epsilon\)
A Hilbert space \(\calH\) is a complete inner product space, i.e., an inner product space in which every Cauchy sequence in \(\calH\) converges to a point in \(\calH\)
Intuition: a Hilbert space has no “missing point”
A finite dimensional inner product space is complete (many interesting Hilbert spaces are infinite dimensional!)
Let \(\calH\) be a Hilbert space and \(\calM\) a closed subspace of \(\calH\). For any \(x\in\calH\), \(\exists\) a unique \(m^*\in\calM\) for which \[ \forall m\in\calM\quad \norm{x-m^*}\leq\norm{x-m} \] Furthermore \(m^*\) is the unique vector \(m\in\calM\) such that \(x-m\perp \calM\).
The vector \(m^*\) is called the orthogonal projection of \(x\) onto \(\calM\).
The (topological) dual of a Hilbert space \(\calH\) is the space of continuous linear functionals \(f:\calH\to\bbR\).
In a Hilbert space \(\calH\), all continuous linear functions are of the form \(\dotp{\cdot}{g}_\calH\) for some \(g\in\calH\).
If a reproducing kernel exists, \(\calH\) is called a reproducing kernel Hilbert space (RKHS).
An RKHS is a space of functions
Every \(x\in\calX\) can be mapped to a function in \(\calH\) with the kernel map \(\Phi:\calX\to\calH:x\mapsto K_x\)
The kernel and the reproducing property require an RKHS to be very “well behaved”
Compare to the Hilbert space \(L^2\) of square integrable functions
A Hilbert space \(\calH\subset \bbR^\calX\) is an RKHS if and only if for all \(x\in\calX\) the linear functional \(\delta_x:\calH\to\bbR:f\mapsto f(x)\) is continuous.
Convergence in an RKHS implies pointwise convergence.
An RKHS \(\calH\) has a unique reproducing kernel \(K\). Conversely a reproducing kernel \(K:\calX\times\calX\to\bbR\) is associated to a unique RKHS.
We can talk equivalently about a reproducing kernel or an RKHS without ambiguity
For any function \(f\) in an RKHS \(\calH\), for any \(x,y\in\calX\) \[\abs{f(x)-f(y)}\leq \norm[\calH]{f}\norm[\calH]{K_x-K_y}\]
The norm \(\norm[\calH]{f}\) controls how fast the function varies w.r.t. the geometry defined by the kernel on \(\calX\).
A function \(K:\calX\times\calX\to\bbR\) is a reproducing kernel if and only if it is a positive definite kernel.
This is essentially Aronszajn’s theorem!
Take \(\calX=\bbR^d\). The linear kernel is \(K(\bfx,\bfy)=\dotp{\bfx}{\bfy}_{\bbR^d}=\bfx^\intercal\bfy\).
The polynomial kernel of degree 2 is \(K(\bfx,\bfy)=\dotp{\bfx}{\bfy}_{\bbR^d}^2=(\bfx^\intercal\bfy)^2\).
One can easily combine kernels to create new ones
Let \(K_1\) and \(K_2\) be positive definite kernels and let \(\alpha\geq 0\). Then, \(K_1+K_2\), \(K_1\times K_2\) and \(\alpha K_1\) are positive definite kernels.
If \(\set{K_i}_{i\geq 1}\) is a sequence of positive definite kernels that converges pointwise to \(K\), then \(K\) is a positive definite kernel.
Let \(\calX\) be a set endowed with a positive definite kernel \(K\). Let \(\calH\) be the associated RKHS.
Let \(\calS\eqdef\set{x_i}_{i=1}^n\) be a finite set of points in \(\calX\).
Let \(\Psi:\bbR^{n+1}\to\bbR\) be strictly increasing w.r.t the last variable.
The solution of the problem \(\min_{f\in\calH}\Psi(f(x_1),\cdots,f(x_n),\norm[\calH]{f})\) is of the form \[ \forall x\in\calX\quad f(x) = \bfalpha^\intercal \bfK(x) = \sum_{i=1}^n\alpha_iK(x_i,x) \] where \(\bfalpha\eqdef\mat{ccc}{\alpha_1&\cdots&\alpha_n}^\intercal\) and \(\bfK(x)\eqdef\mat{ccc}{K(x_1,x)&\cdots&K(x_n,x)}^\intercal\).
The solution lives in a subspace of dimension \(n\)
Practical implication: Let \(\bfK\eqdef\mat{c}{K(x_i,x_j)}_{i,j\in\intseq{1}{n}}\) be the Gram matrix of \(\calS\). We can equivalently solve \[ \min_{\bfalpha} \Psi(\mat{c}{\bfK\bfalpha}_1,\cdots,\mat{c}{\bfK\bfalpha}_n,\bfalpha^\intercal\bfK\bfalpha) \] where \(\mat{c}{\bfK\bfalpha}_j\eqdef \bfK(\bfx_j)\). Can usually be solved analytically or by numerical methods
Principle: map data from \(\calX\) to RKHS \(\calH\) defined by kernel \(K\) as \[ \varphi:\calX\to\calH:x\mapsto K_x\eqdef K(\cdot,x) \]