Mathematical Foundations of Machine Learning

Prof. Matthieu Bloch

Wednesday, October 2, 2024 (v1.0)

Last time

Last class: Monday September 30, 2024
- We showed that continuous linear functionals on Hilbert Spaces are inner products!
- This result is known as the Riesz representation theorem
Today: (finally) wrap up Reproducing Kernel Hilbert Spaces

To be effectively prepared for today's class, you should have:
1. Come to class on Monday September 30, 2024
2. Gone over slides and read associated lecture notes here and there and there and there
3. Submitted Homework 3
Logistics: use office hours to review for the midterm!
- Jack Hill office hours: Wednesday 11:30am-12:30pm in TSRB and hybrid
- Anuvab Sen office hours: Thursday 12pm-1pm in TSRB and hybrid
- Dr. Bloch: Friday October 04, 2024 6pm - online
Homework 4: posted, due Sunday October 20.
- We agreed to give you room to breathe, prepare for the midterm and enjoy Fall break
- However, the pace of subsequent homework will be faster (~1/week)

Midterm

Date: Wednesday October 9, 2024 3:30pm-4:45pm in usual classroom
- Please be on time and plan to finish when asked
- Open notes (slides, annotations, lecture notes, your notes) - no Internet, no AI, no textbook
Coverage
- Everything up to and including Lecture 11 on Wednesday September 25, 2024
- Representer theorem is in scope (functionals and RKHS are not in scope)
- Remember what \(B\)-splines are but don't worry about proving the detailed properties proved in Homework 1
My expectations
- You have worked on all the homework
- You have read the lecture notes
What I am thinking about testing (this is not meant to be a hint as to what will be on the test)
- Abstract Hilbert spaces: can you manipulate functions as if they were vectors?
- Orthogonality principle: can you apply it properly to solve an optimization problem?
- Least square regression: are you able to characterize the solution?

The big picture

For a Hilbert space \(\calF\) and \(n\) pairs \((x_i,y_i)\in\calF\times \bbR\), we know how to solve the following problem: \[ \min_{f\in\calF}\sum_{i=1}^n\abs{y_i-{\dotp{f}{x_i}}_{\calF}}^2+\lambda\norm[\calF]{f}^2 \]
We would really like to solve the following problem for \(n\) pairs \((\bfx_i,y_i)\in\bbR^d\times\bbR\) \[ \min_{f\in\calF}\sum_{i=1}^n\abs{y_i-f(\bfx_i)}^2+\lambda\norm[\calF]{f}^2 \]
The question whether \(f(\bfx_i) = {\dotp{f}{x_i}}_{\calF}\) for some \(x_i\in\calF\) function of \(\bfx_i\): can this be done?

Reproducing Kernel Hilbert Spaces (RKHSs) are specific Hilbert spaces where this happens to be true
- Hilbert space of functions where the sampling linear operation is a continuous linear functional
As usual, we're throwing definitions at our problem to make progress

Representation of (continuous) linear functionals

Let \(F:\calF\to\bbR\) be a linear functional on an \(n\)-dimensional Hilbert space \(\calF\).

There exists \(c\in\calF\) such that \(F(x)=\dotp{x}{c}\) for every \(x\in\calF\)

Linear functional over finite dimensional Hilbert spaces are continuous!
This is not true in infinite dimension

Let \(F:\calF\to\bbR\) be a continuous linear functional on a (possible infinite dimensional) separable Hilbert space \(\calF\).

There exists \(c\in\calF\) such that \(F(x)=\dotp{x}{c}\) for every \(x\in\calF\)

If \(\set{\psi_n}_{n\geq 1}\) is an orthobasis for \(\calH\), then we can construct \(c\) above as \[ c\eqdef \sum_{n=1}^\infty F(\psi_n)\psi_n \]

Reproducing Kernel Hilbert Spaces

An RKHS is a Hilbert space \(\calH\) of real-valued functions \(f:\bbR^d\to\bbR\) in which the sampling operation \(\calS_\bftau:\calH\to\bbR:f\mapsto f(\bftau)\) is continuous for every \(\bftau\in\bbR^d\).

In other words, for each \(\bftau\in\bbR^d\), there exists \(k_\bftau\in\calH\) s.t. \[ f(\bftau) = {\dotp{f}{k_\bftau}}_\calH\text{ for all } f\in\calH \]

The kernel of an RKHS is \[ k:\bbR^d\times\bbR^d\to\bbR:(\bft,\bftau)\mapsto k_{\bftau}(\bft) \] where \(k_\bftau\) is the element of \(\calH\) that defines the sampling at \(\bftau\).

A (separable) Hilbert space with orthobasis \(\set{\psi_n}_{n\geq 1}\) is an RKHS iff \(\forall \bftau\in\bbR^d\) \(\sum_{n=1}^\infty\abs{\psi_{n}(\bftau)}^2<\infty\)

RKHS and non orthogonal basis

If \(\set{\phi_n}_{n\geq 1}\) is a Riesz basis for \(\calH\), we know that every \(x\in\calH\) can be written \[ x = \sum_{n\geq 1}\alpha_n\phi_n\textsf{ with } \alpha_n\eqdef\dotp{x}{\smash{\widetilde{\phi}_n}} \] where \(\set{\widetilde{\phi}_n}_{n\geq 1}\) is the dual basis.

A (separable) Hilbert space with Riesz basis \(\set{\phi_n}_{n\geq 1}\) is an RKHS with kernel \[ k(\bft,\bftau) =\sum_{n=1}^\infty \phi_n(\bftau)\widetilde{\phi}_n(\bft) \] iff \(\forall \bftau\in\bbR^d\) \(\sum_{n=1}^\infty\abs{\phi_{n}(\bftau)}^2<\infty\)

Kernel regression

Regression problem: given \(n\) pairs \((\bfx_i,y_i)\in\bbR^d\times\bbR\), solve \[ \min_{f\in\calF}\sum_{i=1}^n\abs{y_i-f(\bfx_i)}^2+\lambda\norm[\calF]{f}^2 \]
If we restrict \(\calF\) to be an RKHS, the problem becomes \[ \min_{f\in\calF}\sum_{i=1}^n\abs{y_i-{\dotp{f}{x_i}}_{\calF}}^2+\lambda\norm[\calF]{f}^2 \]

where \(x_i\eqdef k_{\bfx_i}\) provides the mapping between \(\bbR^d\) and \(\calF\) \[ x_i:\bfR^d\to\bbR:\bft\mapsto k_{\bfx_i}(\bft) = k(\bfx_i,\bft) \]
The solution is given by \(\widehat{f} = \sum_{i=1}^n \widehat{\alpha}_i x_i\textsf{ with }\widehat{\bfalpha}\eqdef (\bfK+\lambda\bfI)^{-1}\bfy\) and \(\bfK\eqdef[K_{i,j}]_{1\leq i,j\leq n}\) with \(K_{i,j}=\dotp{x_i}{x_j}\)

Kernel regression

Kernel magic
1. \(K_{ij} = \dotp{x_i}{x_j}=\dotp{k_{\bfx_i}}{k_{\bfx_j}} = k_{\bfx_i}(\bfx_j) = k(\bfx_i,\bfx_j)\)
2. \(\widehat{f}(\bfx) = \dotp{\widehat{f}}{k_{\bfx}} = \sum_{i=1}^n\widehat{\alpha_i}k(\bfx_i,\bfx)\)
Remarks
- We solved an infinite dimensional problem using an \(n\times n\) system of equations and linear algebra
- Our solution and the evaluation only depend on the kernel; we never need to work directly in \(\calF\)
Question: can we skip \(\calF\) entirely? how do we find "good" kernels?

Aronszjan's theorem

An inner product kernel is a mapping \(k:\bbR^d\times\bbR^d\to\bbR\) for which there exists a Hilbert space \(\calH\) and a mapping \(\Phi:\bbR^d\to\calH\) such that \[\forall \bfu,\bfv\in\bbR^d\quad k(\bfu,\bfv)=\langle\Phi(\bfu),\Phi(\bfv)\rangle_\calH\]

A function \(k:\bbR^d\times\bbR^d\to\bbR\) is a positive semidefinite kernel if

\(k\) is symmetric, i.e., \(k(\bfu,\bfv)=k(\bfv,\bfu)\)
for all \(\{\bfx_i\}_{i=1}^N\), the Gram matrix \(\bfK\) is positive semidefinite, i.e., \[\bfx^\intercal\bfK\bfx\geq 0\text{ with }\bfK=[K_{i,j}]\text{ and }K_{i,j}\eqdef k(\bfx_i,\bfx_j)\]

A function \(k:\bbR^d\times\bbR^d\to\bbR\) is an inner product kernel if and only if \(k\) is a positive semidefinite kernel.

Examples

Regression using linear and quadratic functions in \(\bbR^d\)

Regression using Radial Basis Functions

Examples of kernels
- Homogeneous polynomial kernel: \(k(\bfu,\bfv) = (\bfu^\intercal\bfv)^m\) with \(m\in\bbN^*\)
- Inhomogenous polynomial kernel: \(k(\bfu,\bfv) = (\bfu^\intercal\bfv+c)^m\) with \(c>0\), \(m\in\bbN^*\)
- Radial basis function (RBF) kernel: \(k(\bfu,\bfv) = \exp\left(-\frac{\norm{\bfu-\bfv}^2}{2\sigma^2}\right)\) with \(\sigma^2>0\)

Next time

Next class: Monday October 07, 2024

To be effectively prepared for next class, you should:
1. Go over today's slides and read associated lecture notes here and there and there and there
2. Start reviewing your notes for the Midterm (Wednesday October 09, 2024)
To be effectively prepared for the midterm, you should:
1. Not panic
2. Not start reviewing your notes the night before
3. Start reviewing your notes today
4. Post questions on Piazza
5. Come to office hours with questions
Optional
- Export slides for next lecture as PDF (be on the lookout for an announcement when they're ready)