Singular value decomposition

Dr. Matthieu R Bloch

Wednesday, November 3, 2021

Logistics

  • General announcements

    • Let me know if you have concerns about your grades (disagreements, etc.)
    • Be careful with honor code
  • Office hours on Friday November 05, 2021

    • 8am-9:30am on BlueJeans (BlueJeans)
    • Focus on Midterm 2 preparation
  • Midterm 2:

    • Moved to Monday November 8, 2021 (gives you weekend to prepare)
    • Coverage: everything since Midterm 1 (dont’ forget the fundamentals though), emphasis on regression

What’s on the agenda for today?

Image
Toddlers can do it!
  • Last time:

    • Symmetric matrices and spectral theorem
    • Objective: further understand least-square problems
  • Today: singular value decomposition

  • Reading: lecture notes 12/13

Spectral theorem

  • Every complex matrix \(\matA\) has at least one complex eigenvector and every real symmetrix matrix has real eigenvalues and at least one real eigenvector.

  • Every matrix \(\matA\in\bbC^{n\times n}\) is unitarily similar to an upper triangular matrix, i.e., \[ \bfA = \bfV\boldsymbol{\Delta}\bfV^\dagger \] with \(\boldsymbol{\Delta}\) upper triangular and \(\bfV^\dagger=\bfV^{-1}\).
  • Every hermitian matrix is unitarily similar to a real-valued diagonal matrix.
  • Note that if \(\matA = \matV\matD\matV^\dagger\) then \[ \matA = \sum_{i=1}^n\lambda_i \vecv_i\vecv_i^\dagger \]

  • How about real-valued matrices \(\matA\in\bbR^{n\times n}\)?

Symmetric positive definite matrices

  • A symmetric matrice \(\matA\) is positive definite if it has positive eigenvalues, i.e., \(\forall i\in\set{1,\cdots,n}\lambda_i>0\).

    A symmetric matrice \(\matA\) is positive semidefinite if it has nonnegative eigenvalues, i.e., \(\forall i\in\set{1,\cdots,n}\lambda_i\geq 0\).

  • Convention: \(\lambda_1\geq \lambda_2\geq \cdots \geq \lambda_n\)

  • Variational form of extreme eigenvalues for symmetric positive definite matrices \(\bfA\) \[ \begin{align} \lambda_1 &= \max_{\vecx\in\bbR^n:\norm[2]{\bfx}=1}\vecx^\intercal \matA\vecx = \max_{\vecx\in\bbR^n}\frac{\vecx^\intercal \matA\vecx}{\norm[2]{\vecx}^2}\\ \lambda_n &= \min_{\vecx\in\bbR^n:\norm[2]{\bfx}=1}\vecx^\intercal \matA\vecx = \min_{\vecx\in\bbR^n}\frac{\vecx^\intercal \matA\vecx}{\norm[2]{\vecx}^2} \end{align} \]

  • For any analytic function \(f\), we have \[ f(\matA) = \sum_{i=1}^n f(\lambda_i)\vecv_i\vecv_i^\intercal \]

System of symmetric definite equations

  • Consider the system \(\vecy=\matA\vecx\) with \(\matA\) symmetric positive definite

  • Let \(\set{\vecv_i}\) be the eigenvectors of \(\matA\). \[ \vecx = \sum_{i=1}^n\frac{1}{\lambda_i}\dotp{\vecy}{\vecv_i}\vecv_i \]

  • Assume some observation error \(\vecy=\matA\vecx+\vece\), with \(\vece\) unknown, and we reconstruct \(\vecx\) as \(\widetilde{\vecx}\) by applying \(\matA^{-1}\)

  • \[ \frac{1}{\lambda_1^2}\norm[2]{\vece}^2\leq \norm[2]{\vecx-\tilde{\vecx}}\leq \frac{1}{\lambda_n^2}\norm[2]{\vece}^2. \]
  • If \(\vece\sim\calN(\boldsymbol{0},\sigma^2\matI)\), then \[ \E{\norm[2]{\vecx-\tilde{\vecx}}}=\sigma^2\sum_{i=1}^n\frac{1}{\lambda_i^2} \]

Singular value decomposition

  • What happens for non-square matrice?

  • Let \(\matA\in\bbR^{m\times n}\) with \(\text{rank}(\matA)=r\). Then \(\matA=\matU\boldsymbol{\Sigma}\matV^T\) where

    • \(\matU\in\bbR^{m\times r}\) such that \(\matU^\intercal\matU=\bfI_r\) (orthonormal columns)
    • \(\matN\in\bbR^{n\times r}\) such that \(\matV^\intercal\matV=\bfI_r\) (orthonormal columns)
    • \(\boldsymbol{\Sigma}\in\bbR^{r\times r}\) is diagonal with positive entries \[ \boldsymbol{\Sigma}\eqdef\mat{cccc}{\sigma_1&0&0&\cdots\\0&\sigma_2&0&\cdots\\\vdots&&\ddots&\\0&\cdots&\cdots&\sigma_r} \] and \(\sigma_1\geq\sigma_2\geq\cdots\geq\sigma_r>0\). The \(\sigma_i\) are called the singular values
  • We say that \(\matA\) is full rank is \(r=\min(m,n)\)

  • We can write \(\matA=\sum_{i=1}^r\sigma_i\vecu_i\vecv_i^\intercal\)

  • Important properties of the SVD

    • The columns of \(\matV\) \(\set{\vecv_i}_{i=1}^r\) are eigenvectors of the psd matrix \(\matA^\intercal\matA\). \(\set{\sigma_i:1\leq i\leq n\text{ and } \sigma_i\neq 0}\) are the square roots of the non-zero eigenvalues of \(\matA^\intercal\matA\).

    • The columns of \(\matU\) \(\set{\vecu_i}_{i=1}^r\) are eigenvectors of the psd matrix \(\matA\matA^\intercal\). \(\set{\sigma_i:1\leq i\leq n\text{ and } \sigma_i\neq 0}\) are the square roots of the non-zero eigenvalues of \(\matA\matA^\intercal\).

    • The columns of \(\matV\) form an orthobasis for \(\text{row}(\matA)\)

    • The columns of \(\matU\) form an orthobasis for \(\text{col}(\matA)\)

    • Equivalent form of the SVD: \(\matA=\widetilde{\matU}\widetilde{\boldsymbol{\Sigma}}\widetilde{\matV}^T\) where

      • \(\widetilde{\matU}\in\bbR^{m\times m}\) is orthonormal
      • \(\widetilde{\matV}\in\bbR^{n\times n}\) is orthonormal
      • \(\widetilde{\boldsymbol{\Sigma}}\in\bbR^{m\times n}\) is \[ \widetilde{\boldsymbol{\Sigma}}\eqdef\mat{cc}{\boldsymbol{\Sigma}&\boldsymbol{0}\\\boldsymbol{0}&\boldsymbol{0}} \]

    SVD and least-squares

    • When we cannot solve \(\vecy=\matA\vecx\), we solve instead \[ \min_{\bfx\in\bbR^n}\norm[2]{\vecx}^2\text{ such that } \matA^\intercal\matA\vecx = \matA^\intercal\vecy \]

      • This allows us to pick the minimum norm solution among potentially infinitely many solutions of the normal equations.
    • Recall: when \(\matA\in\bbR^{m\times n}\) is of rank \(n\), then \(\bfx=\matA^\intercal(\matA\matA^\intercal)^{-1}\vecy\)

    • The solution of \[ \min_{\bfx\in\bbR^n}\norm[2]{\vecx}^2\text{ such that } \matA^\intercal\matA\vecx = \matA^\intercal\vecy \] is \[ \hat{\vecx} = \matV\boldsymbol{\Sigma}^{-1}\matU^\intercal\vecy \] where \(\matA=\matU\boldsymbol{\Sigma}\matV^T\) is the SVD of \(\matA\).

    Pseudo inverse

    • \(\matA^+ = \matV\boldsymbol{\Sigma}^{-1}\matU^\intercal\) is called the pseudo-inverse, Lanczos inverse, or Moore-Penrose inverse of \(\matA=\matU\boldsymbol{\Sigma}\matV^T\).

    • If \(\matA\) is square invertible then \(\matA^+=\matA\)

    • If \(m\geq n\) (tall and skinny matrix) of rank \(n\) then \(\matA^+ = (\matA^\intercal\matA)^{-1}\matA^\intercal\)

    • If \(m\geq m\) (short and fat matrix) of rank \(m\) then \(\matA^+ = \matA^\intercal(\matA\matA^\intercal)^{-1}\)

    • Note \(\matA^+\) is as “close” to an inverse of \(\matA\) as possible