Matthieu Bloch
Thursday, November 17, 2022
Suppose that \[ \matM = \left[\begin{array}{cc}\matM_{11}&\matM_{12}\\\matM_{21}&\matM_{22}\end{array}\right]. \] If \(\matM_{22}\) invertible, the Schur complement of \(\matM\) in \(\matM_{22}\) is \(\matS_{22} \eqdef \matM_{11}-\matM_{12}\matM_{22}^{-1}\matM_{21}\).
If \(\matM_{11}\) invertible, the Schur complement of \(\matM\) in \(\matM_{11}\) is \(\matS_{11} \eqdef \matM_{22}-\matM_{21}\matM_{11}^{-1}\matM_{12}\).
\[ \matM^{-1} = \left[\begin{array}{cc}\matS_{22}^{-1}&-\matS_{22}^{-1}\matM_{12}\matM_{22}^{-1}\\-\matM_{22}^{-1}\matM_{21}\matS_{22}^{-1}&\matM_{22}^{-1}+\matM_{22}^{-1}\matM_{21}\matS_{22}^{-1}\matM_{12}\matM_{22}^{-1}\end{array}\right]. \] \[ \matM^{-1} = \left[\begin{array}{cc}\matM_{11}^{-1}+\matM_{11}^{-1}\matM_{12}\matS_{11}^{-1}\matM_{21}\matM_{11}^{-1}&-\matM_{11}^{-1}\matM_{12}\matS_{11}^{-1}\\-\matS_{11}^{-1}\matM_{21}\matM_{11}^{-1}&\matS_{11}^{-1}\end{array}\right]. \]
The distribution of \(\vecf^*\) conditioned on \(X\), \(X^*\) and \(\vecf\) is \[ \calN\left(\matK(X^*,X)\matK(X,X)^{-1}\vecf,\matK(X^*,X^*)-\matK(X^*,X)\matK(X,X)^{-1}\matK(X,X^*)\right) \] Hence we can estimate \(\vecf^*\) as \(\matK(X^*,X)\matK(X,X)^{-1}\vecf\) and we can quantify our uncertainty.
Often, we only observe \(y = f(x)+{\varepsilon}\) with \({\varepsilon}\sim\calN(0,\sigma^2)\) (i.i.d. across different measurements)
\[ \left[\begin{array}{c}\vecy\\\vecf^*\end{array}\right]\sim\calN\left(\boldsymbol{0},\left[\begin{array}{cc}\matK(X,X)+\sigma^2\matI&\matK(X,X^*)\\\matK(X^*,X)&\matK(X^*X^*)\end{array}\right]\right) \]
The distribution of \(\vecf^*\) conditioned on \(X\), \(X^*\) and \(\vecy\) is \[ \calN\left(\matK(X^*,X)(\matK(X,X)+\sigma^2\matI)^{-1}\vecy,\matK(X^*,X^*)-\matK(X^*,X)(\matK(X,X)+\sigma^2\matI)^{-1}\matK(X,X^*)\right) \] Hence we can estimate \(\vecf^*\) as \(\matK(X^*,X)(\matK(X,X)+\sigma^2\matI)^{-1}\vecy\) and we can quantify our uncertainty.
Small simplification: single test point \(x^*\) \[ f^* = \veck_*^T(\matK + \sigma^2\matI)^{-1}\vecy\qquad \sigma_{f^*} = k(x^*,x^*) - \veck_*^T(\matK+\sigma^2\matI)^{-1}\veck_* \]
A subset \(\calW\neq\emptyset\) of a vector space \(\calV\) is a vector subspace if \(\forall x,y\in\calW\forall \lambda,\mu\in\bbK\) \(\lambda x+\mu y \in\calW\)
If \(\calW\) is a vector subspace of a vector space \(\calV\), \(\calW\) is a vector space.
Let \(\set{v_i}_{i=1}^n\) be a set of vectors in a vector space \(\calV\).
For \(\set{a_i}_{i=1}^n\in\bbK^n\), \(\sum_{i=1}^na_iv_i\) is called a linear combination of the vectors \(\set{v_i}_{i=1}^n\).
The span of the vectors \(\set{v_i}_{i=1}^n\) is the set \[ \text{span}(\set{v_i}_{i=1}^n)\eqdef \{\sum_{i=1}^na_iv_i:\set{a_i}_{i=1}^n\in\bbK^n\} \]
The span of the vectors \(\set{v_i}_{i=1}^n\in\calV^n\) is a vector subspace of \(\calV\).
Let \(\set{v_i}_{i=1}^n\) be a set of vectors in a vector space \(\calV\)
\(\set{v_i}_{i=1}^n\) is linearly independent (or the vectors \(\set{v_i}_{i=1}^n\) are linearly independent ) if (and only if) \[ \sum_{i=1}^na_iv_i = 0\Rightarrow \forall i\in\intseq{1}{n}\,a_i=0 \] Otherwise the set is (or the vectors are) linearly dependent.
Any set of linearly dependent vectors contains a subset of linearly independent vectors with the same span.
If a non vector space \(\calV\neq \set{0}\) as a finite basis with \(n\in\bbN^*\) elements, \(n\) is called the dimension of \(\calV\), denoted \(\dim{\calV}\). If the basis has an infinite number of elements, the dimension is infinite
Any two bases for the same finite dimensional vector space contain the same number of elements.
The properties of vector space seen thus far provide an algebraic structure
We are missing a topological structure to measure length and distance
\(\norm{x}\) measures a length, \(\norm{x-y}\) measures a distance
In addition to a topological and algebraic strucure, what if we want to do geometry?
An inner product space over \(\bbR\) is a vector space \(\calV\) equipped with a positive definite symmetric bilinear form \(\dotp{\cdot}{\cdot}:\calV\times\calV\to\bbR\) called an inner product
An inner product space is also called a pre-Hilbert space
In an inner product space, an inner product induces a norm \(\norm{x} \eqdef \sqrt{\dotp{x}{x}}\)
A norm \(\norm{\cdot}\) is induced by an inner product on \(\calV\) iff \(\forall x,y\in\calV\) \(\norm{x}^2+\norm{y}^2 = \frac{1}{2}\left(\norm{x+y}^2+\norm{x-y}^2\right)\) If this is the case, the inner product is given by the polarization identity \[\dotp{x}{y}=\frac{1}{2}\left(\norm{x}^2+\norm{y}^2-\norm{x-y}^2\right)\]
An induced norm satisfies \(\forall x,y\in\calV\) \(\norm{x+y}\leq \norm{x}+\norm{y}\)
An inner product satisfies \(\forall x,y\in\calV\) \(\dotp{x}{y}^2\leq\dotp{x}{x}\dotp{y}{y}\)
The angle between two non-zero vectors \(x,y\in\calV\) is \[ \cos\theta \eqdef \frac{\dotp{x}{y}}{\norm{x}\norm{y}} \]
Two vectors \(x,y\in\calV\) are orthogonal if \(\dotp{x}{y}=0\). We write \(x\perp y\) for simplicity.
A vector \(x\in\calV\) is orthogonal to a set \(\calS\subset\calV\) if \(\forall s\in\calS\) \(\dotp{x}{s}=0\). We write \(x\perp \calS\) for simplicity.
If \(x\perp y\) then \(\norm{x+y}^2=\norm{x}^2+\norm{y}^2\)
In infinite dimensions, things are a little bit tricky. What does the following mean? \[ x(t) = \sum_{n=1}^\infty \alpha_n\psi_n(t) \]
We need to define a notion of convergence, e.g., \[ \lim_{N\to\infty}\norm{x(t)-\sum_{n=1}^N \alpha_n\psi_n(t)}=0 \]
Problems can still arise if "points are missing"; we avoid this by introducing the notion of completeness
A inner product space \(\calV\) is complete if every Cauchy sequence converges, i.e., for every \(\set{x_i}_{i\geq1}\) in \(\calV\) \[ \lim_{\min(m,n)\to\infty}\norm{x_m-x_n}=0\Rightarrow \lim_{n\to\infty}x_n = x^*\in\calV. \]
We won't worry too much about proving that spaces are complete
A complete normed vector space is a Banach space
Let \(\calH\) be a Hilbert space with induced norm \(\dotp{\cdot}{\cdot}\) and induced norm \(\norm{\cdot}\) ; let \(\calT\) be subspace of \(\calH\)
For \(x\in\calH\), what is the closest point of \(\hat{x}\in\calT\)? How do we solve \(\min_{y\in\calT}\norm{x-y}\)?
This problem has a unique solution given by the orthogonality principle
Let \(\calX\) be a pre-Hilbert space, \(\calT\) be a subspace of \(\calX\), and \(x\in\calX\).
If there exists a vector \(m^*\in\calT\) such that \(\forall m\in\calT\) \(\norm{x-m^*}\leq \norm{x-m}\), then \(m^*\) is unique.
\(m^*\in\calT\) is a unique minimizer if and only if the error \(x-m^*\) be orthogonal to \(\calT\).
This doesn't say that \(m^*\) exists!
Let \(\calH\) be a Hilbert space, \(\calT\) be a closed subspace of \(\calX\), and \(x\in\calX\).
There exists a unique vector \(m^*\in\calT\) such that \(\forall m\in\calT\) \(\norm{x-m^*}\leq \norm{x-m}\).
\(m^*\in\calT\) is a unique minimizer if and only if the error \(x-m^*\) be orthogonal to \(\calT\)
A collection of vectors \(\set{v_i}_{i=1}^n\) in a finite dimensional Hilbert space \(\calH\) is an orthobasis if 1) \(\text{span}(\set{v_i}_{i=1}^n)=\calH\); 2) \(\forall i\neq j\in\intseq{1}{n}\,v_i\perp v_j\); 3) \(\forall i\in\intseq{1}{n} \,\norm{v_i}=1\).
If the last condition is not met, this is just called an orthogonal basis
Orthobases are useful because we can write \(x=\sum_{i=1}^n\dotp{x}{v_i}v_i\) (what happens in a non-orthonormal basis?)
We would like to extend this idea to infinite dimension and happily write \(x=\sum_{i=1}^\infty\dotp{x}{v_i}v_i\)
A space is separable if it contains a countable dense subset.
Separability is the key property to deal with sequences instead of collections
Any separable Hilbert space has an orthonormal basis.
Most useful Hilbert spaces are separable! We won't worry about non-separable Hilbert spaces
Key take away for separable Hilbert spaces
Any separable Hilbert space is isomorphic to \(\ell_2\)
A functional \(F:\calF\to\bbR\) associates real-valued number to an element of a Hilbert space \(\calF\)
A functional \(F:\calF\to\bbR\) is continuous at \(x\in\calF\) if \[ \forall \epsilon>0\exists\delta>0\textsf{ such that } \norm[\calF]{x-y}\leq \delta\Rightarrow \abs{F(x)-F(y)}\leq\epsilon \] If this is true for every \(x\in\calF\), \(F\) is continuous.
A functional \(F\) is linear if \(\forall a,b\in\bbR\) \(\forall x,y\in\calF\) \(F(ax+by) = aF(x)+bF(y)\).
Continuous linear functions are much more constrained than one would imagine
A linear functional \(F:\calF\to\bbR\) is bounded if there exists \(M>0\) such that \[ \forall x\in\calF\quad\abs{F(x)}\leq M\norm[\calF]{x} \]
A linear functional on a Hilbert space that is countinuous at \(0\) is bounded.
Let \(F:\calF\to\bbR\) be a linear functional on an \(n\)-dimensional Hilbert space \(\calF\).
Then there exists \(c\in\calF\) such that \(F(x)=\dotp{x}{c}\) for every \(x\in\calF\)
Linear functional over finite dimensional Hilbert spaces are continuous!
This is not true in infinite dimension
Let \(F:\calF\to\bbR\) be a continuous linear functional on a (possible infinite dimensional) separable Hilbert space \(\calF\).
Then there exists \(c\in\calF\) such that \(F(x)=\dotp{x}{c}\) for every \(x\in\calF\)
If \(\set{\psi_n}_{n\geq 1}\) is an orthobasis for \(\calF\), then we can construct \(c\) above as \[ c\eqdef \sum_{n=1}^\infty F(\psi_n)\psi_n \]
An RKHS is a Hilbert space \(\calH\) of real-valued functions \(f:\bbR^d\to\bbR\) in which the sampling operation \(\calS_\bftau:\calH\to\bbR:f\mapsto f(\bftau)\) is continuous for every \(\bftau\in\bbR^d\).
In other words, for each \(\bftau\in\bbR^d\), there exists \(k_\bftau\in\calH\) s.t. \[ f(\bftau) = {\dotp{f}{k_\bftau}}_\calH\text{ for all } f\in\calH \]
The kernel of an RKHS is \[ k:\bbR^d\times\bbR^d\to\bbR:(\bft,\bftau)\mapsto k_{\bftau}(\bft) \] where \(k_\bftau\) is the element of \(\calH\) that defines the sampling at \(\bftau\).
A (separable) Hilbert space with orthobasis \(\set{\psi_n}_{n\geq 1}\) is an RKHS iff \(\forall \bftau\in\bbR^d\) \(\sum_{n=1}^\infty\abs{\psi_{n}(\tau)}^2<\infty\)
An RKHS is just the right space to solve our problem
If \(\calH\) is an RKHS, then \[ \min_{f\in\calF}\sum_{i=1}^n\abs{y_i-f(\vecx_i)}^2+\lambda\norm[\calH]{f} \] has solution \[ f = \sum_{i=1}^n\alpha_i k_{\vecx_i}\textsf{ with } \bfalpha = (\matK+\lambda\matI)^{-1}\vecy\qquad \matK=\mat{c}{k(\vecx_i,\vecx_j)}_{1\leq i,j\leq n} \]