Mathematical Foundations of Machine Learning

Prof. Matthieu Bloch

Monday, September 9, 2024 (v1.1)

Last time

  • Last class: Wednesday September 04, 2024
    • We talked about how to compute the coefficients of a projection (in finite dimension)
    • We saw the importance of an orthobasis and extended the definition to infinite dimension
    • We proved the continuity of the inner product and Bessel's inequality
    • We introduced the notion of separable space to reproduce some of the properties of finite dimension
  • To be effectively prepared for today's class, you should have:
    1. Gone over last week's slides and read associated lecture notes
    2. Started looking at Homework 2 (due Friday September 13, 2024)
  • Logistics
    • Office hours
      • Jack Hill: 11:30am-12:30pm on Wednesdays in TSRB and hybrid
      • Anuvab: 12pm-1pm on Thursdays in TSRB and hybrid
      • Dr. Bloch: TBA
    • Solutions of HW1 will be released shortly
    • Grading of HW1 in progress

Towards orthobases in infinite dimension

The following properties of a countable orthonormal set \(\set{e_i}_{i\geq 1}\) are equivalent:

  1. Finite linear combinations of elements in \(\set{e_i}_{i\geq 1}\) are dense in \(\calH\);
  2. If \(v\in\calH\) and \(\forall j\geq 1\) \(\dotp{v}{e_j}=0\) then \(v=0\);
  3. If \(v\in\calH\) then \(\sum_{i=1}^n\dotp{v}{e_i}e_i\) converges to \(v\) (in the norm \(\norm{\cdot}\)) as \(n\to\infty\);
  4. If \(v\in\calH\) and \(\forall k\) \(a_k\eqdef \dotp{v}{e_k}\) then \(\norm{v}^2=\sum_{k\geq 1}\abs{a_k}^2\).

A space is separable if it contains a countable dense subset.

  • Separability is the key property to deal with sequences instead of collections

Any separable Hilbert space has an orthonormal basis.

  • Many useful Hilbert spaces are separable! We won't worry about non-separable Hilbert spaces

Isomorphism with \(\ell_2\)

  • Key take away for separable Hilbert spaces
    • \(x=\sum_{i=1}^\infty\dotp{x}{v_i}v_i\) is perfectly well defined for an orthonormal basis
    • Parseval's identity tell us that \(\norm{x}^2=\sum_{k\geq 1}\abs{\dotp{x}{v_k}}^2\).
    • We don't need to even worry about the nature of \(\calH\) and only think about coefficients \(\dotp{x}{v_i}\)

Any separable Hilbert space is isomorphic to \(\ell_2\); the isomorphism preserves the norm and inner product.

Non-orthogonal bases in finite dimension

Let \(\set{v_i}_{i=1}^n\) be a linearly independent set in a Hilbert space \(\calH\) of dimension \(n\). Then, for any \(x\in\calH\), \(x=\sum_{i=1}^n\alpha_iv_i\) for some \(\bfalpha\in\bbR^n\). In addition, there exists \(A,B>0\) such that \[ A\norm[2]{\bfalpha}^2 \leq \norm[\calH]{x}^2\leq B\norm[2]{\bfalpha}^2 \]

  • Remarks
    • Inequality is tight on both sides
    • For orthobases, \(A=B=1\)
  • Interpretation:
    • The values of \(A\) and \(B\) govern the stability of the representation

Dual basis in finite dimension

  • Recall from orthobases:
    • Perfectly stable representation \(A=B=1\)
    • Efficient computation of representations: \(\alpha_i=\dotp{x}{v_i}\)

For any \(x\in\calH\) with basis \(\set{v_i}_{i=1}^n\) we have \[ x=\sum_{i=1}^n\alpha_iv_i\qquad\text{with}\qquad\bfalpha = \matG^{-1}\mat{c}{\dotp{x}{v_1}\\\dotp{x}{v_2}\\\vdots\\\dotp{x}{v_n}} \] There also exists a basis \(\set{\tilde{v}_i}_{i=1}^n\) such that \(\alpha_i=\dotp{x}{\tilde{v}_i}\)

Non-orthogonal bases in infinite dimension

\(\set{v_i}_{i=1}^\infty\) is a Riesz basis for Hilbert space \(\calH\) if \(\text{cl}(\text{span}(\set{v_i}_{i=1}^\infty))=\calH\) and there exists \(A,B>0\) such that \[ A\sum_{i=1}^\infty\alpha_i^2\leq \norm[\calH]{\sum_{i=1}^n\alpha_iv_i}^2\leq B\sum_{i=1}^\infty\alpha_i^2 \] uniformly for all sequences \(\set{\alpha_i}_{i\geq 1}\) with \(\sum_{i\geq 1}\alpha_i^2<\infty\).

  • In infinite dimension, the existence of \(A,B>0\) is not automatic.

Dual basis in infinite dimension

  • Computing expansion on Riesz basis not as simple in infinite dimension: Gram matrix is "infinite"

  • The Gramian is a linear operator \[ \calG:\ell_2(\bbZ)\to\ell_2(\bbZ): \bfx\mapsto \bfy\text{ with }[\calG(\bfx)]_n\eqdef y_n=\sum_{\ell=-\infty}^\infty\dotp{v_\ell}{v_n}x_\ell \]

  • Fact: there exists another linear operator \(\calH:\ell_2(\bbZ)\to\ell_2(\bbZ)\) such that \[ \calH(\calG(\bfx)) = \bfx \] We can replicate what we did in finite dimension!

Next time

  • Next class: Wednesday September 11, 2024
    • We will talk about solving regression problems
    • This will highlight why linear algebra concepts are relevant for machine learning
  • To be effectively prepared for next class, you should:
    1. Go over today's slides and read associated lecture notes
    2. Work on Homework 2 (due date: Friday September 13, 2024)
  • Optional
    • Export slides for next lecture as PDF (be on the lookout for an announcement when they're ready)