# Singular value decomposition

Wednesday, November 3, 2021

## Logistics

• General announcements

• Be careful with honor code
• Office hours on Friday November 05, 2021

• 8am-9:30am on BlueJeans (BlueJeans)
• Focus on Midterm 2 preparation
• Midterm 2:

• Moved to Monday November 8, 2021 (gives you weekend to prepare)
• Coverage: everything since Midterm 1 (dont’ forget the fundamentals though), emphasis on regression

## What’s on the agenda for today?

• Last time:

• Symmetric matrices and spectral theorem
• Objective: further understand least-square problems
• Today: singular value decomposition

## Spectral theorem

• Every complex matrix $\matA$ has at least one complex eigenvector and every real symmetrix matrix has real eigenvalues and at least one real eigenvector.

• Every matrix $\matA\in\bbC^{n\times n}$ is unitarily similar to an upper triangular matrix, i.e., $\bfA = \bfV\boldsymbol{\Delta}\bfV^\dagger$ with $\boldsymbol{\Delta}$ upper triangular and $\bfV^\dagger=\bfV^{-1}$.
• Every hermitian matrix is unitarily similar to a real-valued diagonal matrix.
• Note that if $\matA = \matV\matD\matV^\dagger$ then $\matA = \sum_{i=1}^n\lambda_i \vecv_i\vecv_i^\dagger$

• How about real-valued matrices $\matA\in\bbR^{n\times n}$?

## Symmetric positive definite matrices

• A symmetric matrice $\matA$ is positive definite if it has positive eigenvalues, i.e., $\forall i\in\set{1,\cdots,n}\lambda_i>0$.

A symmetric matrice $\matA$ is positive semidefinite if it has nonnegative eigenvalues, i.e., $\forall i\in\set{1,\cdots,n}\lambda_i\geq 0$.

• Convention: $\lambda_1\geq \lambda_2\geq \cdots \geq \lambda_n$

• Variational form of extreme eigenvalues for symmetric positive definite matrices $\bfA$ \begin{align} \lambda_1 &= \max_{\vecx\in\bbR^n:\norm[2]{\bfx}=1}\vecx^\intercal \matA\vecx = \max_{\vecx\in\bbR^n}\frac{\vecx^\intercal \matA\vecx}{\norm[2]{\vecx}^2}\\ \lambda_n &= \min_{\vecx\in\bbR^n:\norm[2]{\bfx}=1}\vecx^\intercal \matA\vecx = \min_{\vecx\in\bbR^n}\frac{\vecx^\intercal \matA\vecx}{\norm[2]{\vecx}^2} \end{align}

• For any analytic function $f$, we have $f(\matA) = \sum_{i=1}^n f(\lambda_i)\vecv_i\vecv_i^\intercal$

## System of symmetric definite equations

• Consider the system $\vecy=\matA\vecx$ with $\matA$ symmetric positive definite

• Let $\set{\vecv_i}$ be the eigenvectors of $\matA$. $\vecx = \sum_{i=1}^n\frac{1}{\lambda_i}\dotp{\vecy}{\vecv_i}\vecv_i$

• Assume some observation error $\vecy=\matA\vecx+\vece$, with $\vece$ unknown, and we reconstruct $\vecx$ as $\widetilde{\vecx}$ by applying $\matA^{-1}$

• $\frac{1}{\lambda_1^2}\norm[2]{\vece}^2\leq \norm[2]{\vecx-\tilde{\vecx}}\leq \frac{1}{\lambda_n^2}\norm[2]{\vece}^2.$
• If $\vece\sim\calN(\boldsymbol{0},\sigma^2\matI)$, then $\E{\norm[2]{\vecx-\tilde{\vecx}}}=\sigma^2\sum_{i=1}^n\frac{1}{\lambda_i^2}$

## Singular value decomposition

• What happens for non-square matrice?

• Let $\matA\in\bbR^{m\times n}$ with $\text{rank}(\matA)=r$. Then $\matA=\matU\boldsymbol{\Sigma}\matV^T$ where

• $\matU\in\bbR^{m\times r}$ such that $\matU^\intercal\matU=\bfI_r$ (orthonormal columns)
• $\matN\in\bbR^{n\times r}$ such that $\matV^\intercal\matV=\bfI_r$ (orthonormal columns)
• $\boldsymbol{\Sigma}\in\bbR^{r\times r}$ is diagonal with positive entries $\boldsymbol{\Sigma}\eqdef\mat{cccc}{\sigma_1&0&0&\cdots\\0&\sigma_2&0&\cdots\\\vdots&&\ddots&\\0&\cdots&\cdots&\sigma_r}$ and $\sigma_1\geq\sigma_2\geq\cdots\geq\sigma_r>0$. The $\sigma_i$ are called the singular values
• We say that $\matA$ is full rank is $r=\min(m,n)$

• We can write $\matA=\sum_{i=1}^r\sigma_i\vecu_i\vecv_i^\intercal$

• ## Important properties of the SVD

• The columns of $\matV$ $\set{\vecv_i}_{i=1}^r$ are eigenvectors of the psd matrix $\matA^\intercal\matA$. $\set{\sigma_i:1\leq i\leq n\text{ and } \sigma_i\neq 0}$ are the square roots of the non-zero eigenvalues of $\matA^\intercal\matA$.

• The columns of $\matU$ $\set{\vecu_i}_{i=1}^r$ are eigenvectors of the psd matrix $\matA\matA^\intercal$. $\set{\sigma_i:1\leq i\leq n\text{ and } \sigma_i\neq 0}$ are the square roots of the non-zero eigenvalues of $\matA\matA^\intercal$.

• The columns of $\matV$ form an orthobasis for $\text{row}(\matA)$

• The columns of $\matU$ form an orthobasis for $\text{col}(\matA)$

• Equivalent form of the SVD: $\matA=\widetilde{\matU}\widetilde{\boldsymbol{\Sigma}}\widetilde{\matV}^T$ where

• $\widetilde{\matU}\in\bbR^{m\times m}$ is orthonormal
• $\widetilde{\matV}\in\bbR^{n\times n}$ is orthonormal
• $\widetilde{\boldsymbol{\Sigma}}\in\bbR^{m\times n}$ is $\widetilde{\boldsymbol{\Sigma}}\eqdef\mat{cc}{\boldsymbol{\Sigma}&\boldsymbol{0}\\\boldsymbol{0}&\boldsymbol{0}}$

## SVD and least-squares

• When we cannot solve $\vecy=\matA\vecx$, we solve instead $\min_{\bfx\in\bbR^n}\norm[2]{\vecx}^2\text{ such that } \matA^\intercal\matA\vecx = \matA^\intercal\vecy$

• This allows us to pick the minimum norm solution among potentially infinitely many solutions of the normal equations.
• Recall: when $\matA\in\bbR^{m\times n}$ is of rank $n$, then $\bfx=\matA^\intercal(\matA\matA^\intercal)^{-1}\vecy$

• The solution of $\min_{\bfx\in\bbR^n}\norm[2]{\vecx}^2\text{ such that } \matA^\intercal\matA\vecx = \matA^\intercal\vecy$ is $\hat{\vecx} = \matV\boldsymbol{\Sigma}^{-1}\matU^\intercal\vecy$ where $\matA=\matU\boldsymbol{\Sigma}\matV^T$ is the SVD of $\matA$.

## Pseudo inverse

• $\matA^+ = \matV\boldsymbol{\Sigma}^{-1}\matU^\intercal$ is called the pseudo-inverse, Lanczos inverse, or Moore-Penrose inverse of $\matA=\matU\boldsymbol{\Sigma}\matV^T$.

• If $\matA$ is square invertible then $\matA^+=\matA$

• If $m\geq n$ (tall and skinny matrix) of rank $n$ then $\matA^+ = (\matA^\intercal\matA)^{-1}\matA^\intercal$

• If $m\geq m$ (short and fat matrix) of rank $m$ then $\matA^+ = \matA^\intercal(\matA\matA^\intercal)^{-1}$

• Note $\matA^+$ is as “close” to an inverse of $\matA$ as possible