Mathematical Foundations of Machine Learning

Prof. Matthieu Bloch

Wednesday, August 21, 2024 (v1.1)

Enrollment updates

  • ISyE and BME sections were added to canvas
  • Enrollment update
    • 141 students across 9 sections
    • Will likely be some movement this week
    • See here to request permits
    • All restrictions are lifted on Friday August 23, 2024
    • Email me if you need the materials

Last time

  • Last class: Monday August 19, 2024
    • We covered class logistics
    • We discussed the problem of representing functions and introduced the notion of basis expansion
    • We talked briefly about how to write proofs
  • To be effectively prepared for today's class, you should have:
    1. Read the syllabus in detail
    2. Gone over the previous slides and read associated lecture notes
    3. Made sure you have access to Canvas, Piazza, Gradescope (if you could enroll)
    4. Started Homework 0 (due date: Monday August 26, 2024)

Example: Lagrange Polynomials

  • Question: How do we find a basis expansion given \(\set{(x_i,y_i)}_{i=1}^n\)?

  • Lagrange polynomials \[ p(x) = \sum_{j=1}^{n} \alpha_j L_j(x)\qquad L_j(x)\eqdef \prod_{1\leq i\leq n, i\neq j}\frac{x-x_i}{x_j-x_i} \]

Given \(n\geq 1\) distinct points \(\set{(x_i,y_i)}_{i=1}^n\), there exists a unique lowest degree interpolating polynomial.

Example: Polynomial splines

Given \(n\geq 1\) distinct points \(\set{(x_i,y_i)}_{i=1}^n\) a polynomial spline \(p\) of order \(\ell\) is such that

  • \(p(x_i)=y_i\) for \(i\in\intseq{1}{n}\)
  • \(p\) is an \(\ell\) th order polynomial between consecutive points \(x_i\) and \(x_{i+1}\)
  • \(p(x)\) has \(\ell-1\) continuous derivatives at the points \(\set{x_i}_{i=1}^n\)
  • B-splines: the splines can be viewed as a linear basis expansion on a basis called B-splines
    • What are the B-splines for \(\ell=0\)?
    • What are the B-splines for \(\ell=1\)?
    • What are the B-splines for \(\ell=2\)?
  • Moving forward, we will need linear algebra to describe this precisely and systematically
    • How do we compute a linear basis expansion?
    • What is a good basis for machine learning?
    • How do we measure the quality of approximation?

Vector (linear) space

  • You are probably used to working in \(\bbR^n\) … we'll deal with weirder objects

A vector space \(\calV\) over a field \(\bbK\) consists of a set \(\calV\) of vectors, a closed addition rule \(+\) and a closed scalar multiplication \(\cdot\) such that 8 axioms are satisfied:

  1. \(\forall x,y\in\calV\) \(x+y=y+x\) (commutativity)
  2. \(\forall x,y,z\in\calV\) \(x+(y+z)=(x+y)+z\) (associativity)
  3. \(\exists 0\in\calV\) such that \(\forall x\in\calV\) \(x+0=x\) (identity element)
  4. \(\forall x\in\calV\) \(\exists y\in\calV\) such that \(x+y=0\) (inverse element)
  5. \(\forall x\in\calV\) \(1\cdot x= x\)
  6. \(\forall \alpha, \beta\in\bbK\) \(\forall x\in\calV\) \(\alpha\cdot(\beta x)=(\alpha\cdot\beta)\cdot x\) (associativity)
  7. \(\forall \alpha, \beta\in\bbK\) \(\forall x\in\calV\) \((\alpha+\beta)x = \alpha x+\beta x\) (distributivity)
  8. \(\forall \alpha\in\bbK\) \(\forall x,y\in\calV\) \(\alpha(x+y) = \alpha x+\alpha y\) (distributivity)
  • \(0\in\calV\) is unique
  • Every \(x\in\calV\) has a unique inverse
  • \(0\cdot x = 0\)
  • The inverse of \(x\in\calV\) is \((-1)\cdot x\eqdef -x\)

Examples of vector spaces

  • Which of these spaces are vector spaces?

\[ \calV_1\eqdef \left\{\mat{ccc}{x_1&\cdots&x_n}^\intercal:\set{x_i}_{i=1}^n\in\bbR^n\right\} \]

\[ \calV_2\eqdef \left\{f:[a,b]\to\bbR:f \textsf{ continuous and bounded}\right\} \]

\[ \calV_3\left\{\mat{ccc}{x_1&\cdots&x_n}^\intercal:\set{x_i}_{i=1}^n\in\bbF_2^n\right\} \]

\[ \calV_4\eqdef \left\{f:[a,b]\to\bbR:f \textsf{ continuous and less than 2}\right\} \]

Subspaces

A subset \(\calW\) of a vector space \(\calV\) is a vector subspace if \(\forall x,y\in\calW\forall \lambda,\mu\in\bbK\) \(\lambda x+\mu y \in\calW\)

If \(\calW\) is a vector subspace of a vector space \(\calV\), \(\calW\) is a vector space.

  • Which of these examples correspond to subspaces?

\(\calV_1=\bbR^5\), \(\calW_1\eqdef\set{x\in\calV_1: x_4=0,x_5=0}\)

\(\calV_2=\bbR^5\), \(\calW_2\eqdef\set{x\in\calV_1: x_4=1,x_5=0}\)

\(\calV_3=\calC([0,1])\), \(\calW_3\eqdef\set{x\in\calV_3:\textsf{ polynomial of degree at most p}}\)

\(\calV_4=\bbR^n\), \(\calW_4\eqdef\set{x:x\textsf{ has not more than 5 non-zero components}}\)

Linear combinations and span

  • Let \(\set{v_i}_{i=1}^n\) be a set of vectors in a vector space \(\calV\).

For \(\set{a_i}_{i=1}^n\in\bbK^n\), \(\sum_{i=1}^na_iv_i\) is called a linear combination of the vectors \(\set{v_i}_{i=1}^n\).

The span of the vectors \(\set{v_i}_{i=1}^n\) is the set \[ \text{span}(\set{v_i}_{i=1}^n)\eqdef \{\sum_{i=1}^na_iv_i:\set{a_i}_{i=1}^n\in\bbK^n\} \]

The span of the vectors \(\set{v_i}_{i=1}^n\in\calV^n\) is a vector subspace of \(\calV\).

Subspaces in \(\bbR^3\)

\(\calM=\set{b_0(t-k):k\in\set{0,1,2,3}}\) where \(b_0(t)=\indic{0\leq t \leq 1}\). What is the span of \(\calM\)?

Linear independence

  • Let \(\set{v_i}_{i=1}^n\) be a set of vectors in a vector space \(\calV\)

\(\set{v_i}_{i=1}^n\) is linearly independent (or the vectors \(\set{v_i}_{i=1}^n\) are linearly independent ) if (and only if) \[ \sum_{i=1}^na_iv_i = 0\Rightarrow \forall i\in\intseq{1}{n}\,a_i=0 \] Otherwise the set is (or the vectors are) linearly dependent.

\[ \calV\eqdef\bbR^3\quad \bfv_1=\mat{ccc}{2&1&0}^\intercal, \bfv_2=\mat{ccc}{1&1&0}^\intercal, \bfv_3=\mat{ccc}{1&2&0}^\intercal \]

\[ \calV\eqdef\calC([0;1])\quad v_1=\cos(2\pi t), v_2=\sin(2\pi t), v_3=2\cos(2\pi t +\frac{\pi}{3}) \]

Any finite set of linearly dependent vectors contains a subset of linearly independent vectors with the same span.

Bases

A basis of vector subspace \(\calW\) of a vector space \(\calV\) is a countable set of vectors \(\calB\) such that:

  1. \(\text{span}(\calB)=\calW\)
  2. \(\calB\) is linearly independent

If a vector space \(\calV\neq \set{0}\) has a finite basis with \(n\in\bbN^*\) elements, \(n\) is called the dimension of \(\calV\), denoted \(\dim{\calV}\). If the basis has an infinite number of elements, the dimension is infinite

Any two bases for the same finite dimensional vector space contain the same number of elements.

  • You should be somewhat familiar with bases (at least in \(\bbR^n\)):
    • the representation of a vector on a basis is unique
    • every subspace has a basis
    • having a basis reduces the operations on vectors to operations on their components
  • Things sort of work in infinite dimensions, but we have to be bit more careful

Next time

  • Next class: Monday August 26, 2024 3:30pm
    • We will talk about norms and inner products
    • We will work towards infinite dimensional spaces of functions
    • We will talk about the notion of completeness
  • To be effectively prepared for next class, you should:
    1. Go over today's slides and read associated lecture notes
    2. Make sure you have access to Canvas, Piazza, Gradescope
    3. Work on Homework 0 (due date: Monday August 26, 2024)
  • Optional
    • Export slides for next lecture as PDF (be on the lookout for an announcement when they're ready)

Inner product and norm

An inner product space over \(\bbR\) is a vector space \(\calV\) equipped with a positive definite symmetric bilinear form \(\dotp{\cdot}{\cdot}:\calV\times\calV\to\bbR\) called an inner product

  • An inner product space is also called a pre-Hilbert space

An inner product satisfies \(\forall x,y\in\calV\) \(\dotp{x}{y}^2\leq\dotp{x}{x}\dotp{y}{y}\)

A norm on a vector space \(\calV\) over \(\bbR\) is a function \(\norm{\cdot}:\calV\to\bbR\) that satisfies:

  • Positive definiteness: \(\forall x\in\calV\) \(\norm{x}\geq 0\) with equality iff \(x=0\)
  • Homogeneity: \(\forall x\in\calV\) \(\forall\alpha\in\bbR\) \(\norm{\alpha x}=\abs{\alpha}\norm{x}\)
  • Subadditivity: \(\forall x,y\in\calV\) \(\norm{x+y}\leq \norm{x}+\norm{y}\)

\[\bfx\in\bbR^d\qquad\norm[0]{\bfx}\eqdef\card{\set{i:x_i\neq 0}}\quad\norm[1]{\bfx}\eqdef\sum_{i=1}^d\abs{x_i}\quad \norm[2]{\bfx}\eqdef\sqrt{\sum_{i=1}^d x_i^2}\]

Induced norm and orthogonality

In an inner product space, an inner product induces a norm \(\norm{x} \eqdef \sqrt{\dotp{x}{x}}\)

A norm \(\norm{\cdot}\) induces an inner product on \(\calV\) iff \(\forall x,y\in\calV\) \(\norm{x}^2+\norm{y}^2 = \frac{1}{2}\left(\norm{x+y}^2+\norm{x-y}^2\right)\)

If this is the case, the inner product is given by the polarization identity \[\dotp{x}{y}=\frac{1}{2}\left(\norm{x}^2+\norm{y}^2-\norm{x-y}^2\right)\]

Two vectors \(x,y\in\calV\) are orthogonal if \(\dotp{x}{y}=0\). We write \(x\perp y\) for simplicity.

A vector \(x\in\calV\) is orthogonal to a set \(\calS\subset\calV\) if \(\forall s\in\calS\) \(\dotp{x}{s}=0\). We write \(x\perp \calS\) for simplicity.

If \(x\perp y\) then \(\norm{x+y}^2=\norm{x}^2+\norm{y}^2\)

Hilbert space

A Cauchy sequence in \(\calV\) is a sequence \(\set{x_n}_{n\geq 0}\) if \(\forall\epsilon>0\) \(\exists N\in\bbN^*\) such that \(\forall n,m>N\) \(\norm{x_n-x_m}\leq \epsilon\)

A Hilbert space \(\calH\) is a complete inner product space, i.e., an inner product space in which every Cauchy sequence in \(\calH\) converges to a point in \(\calH\)

  • Intuition: a Hilbert space has no "missing point"
  • A finite dimensional inner product space is complete (many Hilbert spaces are infinite dimensional!)

Let \(\calH\) be a Hilbert space and \(\calM\) a closed subspace of \(\calH\). For any \(x\in\calH\), \(\exists\) a unique \(m^*\in\calM\) for which \[ \forall m\in\calM\quad \norm{x-m^*}\leq\norm{x-m} \] Furthermore \(m^*\) is the unique vector \(m\in\calM\) such that \(x-m\perp \calM\).

The vector \(m^*\) is called the orthogonal projection of \(x\) onto \(\calM\).