Mathematical Foundations of Machine Learning

Prof. Matthieu Bloch

Wednesday, October 23, 2024 (v1.0)

Last time

Image
  • Last class: Monday October 21, 2024
    • We talked about the spectral theorem
    • We are making our way towards the SVD
  • Today: We will talk about the singular value decomposition
  • To be effectively prepared for today's class, you should have:
    1. Gone over slides and read associated lecture notes here
    2. Submitted Homework 4 and started working on Homework 5
  • Logistics: use office hours to prepare for Homework 4
    • Jack Hill office hours: Wednesday 11:30am-12:30pm in TSRB and hybrid
    • Anuvab Sen office hours: Thursday 12pm-1pm in TSRB and hybrid
    • Dr. Bloch: Friday October 25, 2024 6pm
  • Homework 5: due Tuesday October 22, 2024
    • Don't start accumulating delays, Homework 6 is coming

Spectral theorem

Every complex matrix \(\matA\) has at least one complex eigenvector and every real symmetrix matrix has real eigenvalues and at least one real eigenvector.

Every matrix \(\matA\in\bbC^{n\times n}\) is unitarily similar to an upper triangular matrix, i.e., \[ \bfA = \bfV\boldsymbol{\Delta}\bfV^\dagger \] with \(\boldsymbol{\Delta}\) upper triangular and \(\bfV^\dagger=\bfV^{-1}\).

Every hermitian matrix is unitarily similar to a real-valued diagonal matrix.

  • Note that if \(\matA = \matV\matD\matV^\dagger\) then \(\matA = \sum_{i=1}^n\lambda_i \vecv_i\vecv_i^\dagger\)
  • How about real-valued matrices \(\matA\in\bbR^{n\times n}\)?

Symmetric positive definite matrices

A symmetric matrice \(\matA\) is positive definite if it has positive eigenvalues, i.e., \(\forall i\in\set{1,\cdots,n}\quad\lambda_i>0\).

A symmetric matrice \(\matA\) is positive semidefinite if it has nonnegative eigenvalues, i.e., \(\forall i\in\set{1,\cdots,n}\quad\lambda_i\geq 0\).

  • Convention: \(\lambda_1\geq \lambda_2\geq \cdots \geq \lambda_n\)
  • Variational form of extreme eigenvalues for symmetric positive definite matrices \(\bfA\) \[ \lambda_1 = \max_{\vecx\in\bbR^n:\norm[2]{\bfx}=1}\vecx^\intercal \matA\vecx = \max_{\vecx\in\bbR^n}\frac{\vecx^\intercal \matA\vecx}{\norm[2]{\vecx}^2}\qquad \lambda_n = \min_{\vecx\in\bbR^n:\norm[2]{\bfx}=1}\vecx^\intercal \matA\vecx = \min_{\vecx\in\bbR^n}\frac{\vecx^\intercal \matA\vecx}{\norm[2]{\vecx}^2} \]

For any analytic function \(f\), we have \[ f(\matA) = \sum_{i=1}^n f(\lambda_i)\vecv_i\vecv_i^\intercal \]

System of symmetric definite equations

  • Consider the system \(\vecy=\matA\vecx\) with \(\matA\) symmetric positive definite

Let \(\set{\vecv_i}\) be the eigenvectors of \(\matA\). \[ \vecx = \sum_{i=1}^n\frac{1}{\lambda_i}\dotp{\vecy}{\vecv_i}\vecv_i \]

  • Assume that there exists some observation error \(\vecy=\matA\vecx+\vece\)
    • \(\vece\) is unknown
    • we try to reconstruct \(\vecx\) as \(\widetilde{\vecx}\) by applying \(\matA^{-1}\)

\[ \frac{1}{\lambda_1^2}\norm{\vece}^2\leq \norm[2]{\vecx-\tilde{\vecx}}\leq \frac{1}{\lambda_n^2}\norm{\vece}^2. \]

  • What about the average case?

Singular value decomposition

  • What happens for non-square matrice?

Let \(\matA\in\bbR^{m\times n}\) with \(\text{rank}(\matA)=r\). Then \(\matA=\matU\boldsymbol{\Sigma}\matV^T\) where

  • \(\matU\in\bbR^{m\times r}\) such that \(\matU^\intercal\matU=\bfI_r\) (orthonormal columns)
  • \(\matV\in\bbR^{n\times r}\) such that \(\matV^\intercal\matV=\bfI_r\) (orthonormal columns)
  • \(\boldsymbol{\Sigma}\in\bbR^{r\times r}\) is diagonal with positive entries

\[ \boldsymbol{\Sigma}\eqdef\mat{cccc}{\sigma_1&0&0&\cdots\\0&\sigma_2&0&\cdots\\\vdots&&\ddots&\\0&\cdots&\cdots&\sigma_r} \] and \(\sigma_1\geq\sigma_2\geq\cdots\geq\sigma_r>0\). The \(\sigma_i\) are called the singular values

  • We say that \(\matA\) is full rank is \(r=\min(m,n)\)

  • We can write \(\matA=\sum_{i=1}^r\sigma_i\vecu_i\vecv_i^\intercal\)

Important properties of the SVD

  • The columns of \(\matV\) \(\set{\vecv_i}_{i=1}^r\) are eigenvectors of the psd matrix \(\matA^\intercal\matA\). \(\set{\sigma_i:1\leq i\leq n\text{ and } \sigma_i\neq 0}\) are the square roots of the non-zero eigenvalues of \(\matA^\intercal\matA\).

  • The columns of \(\matU\) \(\set{\vecu_i}_{i=1}^r\) are eigenvectors of the psd matrix \(\matA\matA^\intercal\). \(\set{\sigma_i:1\leq i\leq n\text{ and } \sigma_i\neq 0}\) are the square roots of the non-zero eigenvalues of \(\matA\matA^\intercal\).

  • The columns of \(\matV\) form an orthobasis for \(\text{row}(\matA)\)

  • The columns of \(\matU\) form an orthobasis for \(\text{col}(\matA)\)

  • Equivalent form of the SVD: \(\matA=\widetilde{\matU}\widetilde{\boldsymbol{\Sigma}}\widetilde{\matV}^T\) where

    • \(\widetilde{\matU}\in\bbR^{m\times m}\) is orthonormal
    • \(\widetilde{\matV}\in\bbR^{n\times n}\) is orthonormal
    • \(\widetilde{\boldsymbol{\Sigma}}\in\bbR^{m\times n}\) is

    \[ \widetilde{\boldsymbol{\Sigma}}\eqdef\mat{cc}{\boldsymbol{\Sigma}&\boldsymbol{0}\\\boldsymbol{0}&\boldsymbol{0}} \]

Next time

  • Next class: Monday October 28, 2024
  • To be effectively prepared for next class, you should:
    1. Go over today's slides and read associated lecture notes here
    2. Work on Homework 5
  • Optional
    • Export slides for next lecture as PDF (be on the lookout for an announcement when they're ready)