# Singular value decomposition

Wednesday, November 10, 2021

## Logistics

• General announcements

• Assignment 6 to be posted tonight

• 8 lectures left!

• Midterm 2:

• Grading starting, we’ll keep you posted

## What’s on the agenda for today?

• Last time:

• Singular value decomposition
• Today

• More singular value decomposition

• Application to solving least squares

## Singular value decomposition

• What happens for non-square matrice?

• Let $\matA\in\bbR^{m\times n}$ with $\text{rank}(\matA)=r$. Then $\matA=\matU\boldsymbol{\Sigma}\matV^T$ where

• $\matU\in\bbR^{m\times r}$ such that $\matU^\intercal\matU=\bfI_r$ (orthonormal columns)
• $\matV\in\bbR^{n\times r}$ such that $\matV^\intercal\matV=\bfI_r$ (orthonormal columns)
• $\boldsymbol{\Sigma}\in\bbR^{r\times r}$ is diagonal with positive entries $\boldsymbol{\Sigma}\eqdef\mat{cccc}{\sigma_1&0&0&\cdots\\0&\sigma_2&0&\cdots\\\vdots&&\ddots&\\0&\cdots&\cdots&\sigma_r}$ and $\sigma_1\geq\sigma_2\geq\cdots\geq\sigma_r>0$. The $\sigma_i$ are called the singular values
• We say that $\matA$ is full rank is $r=\min(m,n)$

• We can write $\matA=\sum_{i=1}^r\sigma_i\vecu_i\vecv_i^\intercal$

• ## Important properties of the SVD

• The columns of $\matV$ $\set{\vecv_i}_{i=1}^r$ are eigenvectors of the psd matrix $\matA^\intercal\matA$. $\set{\sigma_i:1\leq i\leq n\text{ and } \sigma_i\neq 0}$ are the square roots of the non-zero eigenvalues of $\matA^\intercal\matA$.

• The columns of $\matU$ $\set{\vecu_i}_{i=1}^r$ are eigenvectors of the psd matrix $\matA\matA^\intercal$. $\set{\sigma_i:1\leq i\leq n\text{ and } \sigma_i\neq 0}$ are the square roots of the non-zero eigenvalues of $\matA\matA^\intercal$.

• The columns of $\matV$ form an orthobasis for $\text{row}(\matA)$

• The columns of $\matU$ form an orthobasis for $\text{col}(\matA)$

• Equivalent form of the SVD: $\matA=\widetilde{\matU}\widetilde{\boldsymbol{\Sigma}}\widetilde{\matV}^T$ where

• $\widetilde{\matU}\in\bbR^{m\times m}$ is orthonormal
• $\widetilde{\matV}\in\bbR^{n\times n}$ is orthonormal
• $\widetilde{\boldsymbol{\Sigma}}\in\bbR^{m\times n}$ is

$\widetilde{\boldsymbol{\Sigma}}\eqdef\mat{cc}{\boldsymbol{\Sigma}&\boldsymbol{0}\\\boldsymbol{0}&\boldsymbol{0}}$

## SVD and least-squares

• When we cannot solve $\vecy=\matA\vecx$, we solve instead $\min_{\bfx\in\bbR^n}\norm{\vecx}^2\text{ such that } \matA^\intercal\matA\vecx = \matA^\intercal\vecy$

• This allows us to pick the minimum norm solution among potentially infinitely many solutions of the normal equations.
• Recall: when $\matA\in\bbR^{m\times n}$ is of rank $n$, then $\bfx=\matA^\intercal(\matA\matA^\intercal)^{-1}\vecy$

• The solution of $\min_{\bfx\in\bbR^n}\norm{\vecx}^2\text{ such that } \matA^\intercal\matA\vecx = \matA^\intercal\vecy$ is $\hat{\vecx} = \matV\boldsymbol{\Sigma}^{-1}\matU^\intercal\vecy = \sum_{i=1}^r\frac{1}{\sigma_i}\dotp{\vecy}{\vecu_i}\vecv_i$ where $\matA=\matU\boldsymbol{\Sigma}\matV^T$ is the SVD of $\matA$.

## Pseudo inverse

• $\matA^+ = \matV\boldsymbol{\Sigma}^{-1}\matU^\intercal$ is called the pseudo-inverse, Lanczos inverse, or Moore-Penrose inverse of $\matA=\matU\boldsymbol{\Sigma}\matV^T$.

• If $\matA$ is square invertible then $\matA^+=\matA$

• If $m\geq n$ (tall and skinny matrix) of rank $n$ then $\matA^+ = (\matA^\intercal\matA)^{-1}\matA^\intercal$

• If $m\geq m$ (short and fat matrix) of rank $m$ then $\matA^+ = \matA^\intercal(\matA\matA^\intercal)^{-1}$

• Note $\matA^+$ is as “close” to an inverse of $\matA$ as possible

## Stability of least squares

• What if we observe $\vecy = \matA\vecx_0+\vece$ and we apply the pseudo inverse? $\hat{\vecx} = \matA^+\vecy$

• We can separate the error analysis into two components $\hat{\vecx}-\vecx_0 = \underbrace{\matA^+\matA\vecx_0-\vecx_0}_{\text{null space error}} + \underbrace{\matA^+\vece}_{\text{noise error}}$

• We will express the error in terms of the SVD $\matA=\matU\boldsymbol{\Sigma}\matV^\intercal$ With

• $\set{\vecv_i}_{i=1}^r$ orthobasis of $\text{row}(\matA)$, augmented by $\set{\vecv_i}_{i=1}^{r+1}\in\ker{\matA}$ to form an orthobasis of $\bbR^n$
• $\set{\vecu_i}_{i=1}^r$ orthobasis of $\text{col}(\matA)$, augmented by $\set{\vecu}_{i=1}^{r+1}\in\ker{\matA^\intercal}$ to form an orthobasis of $\bbR^m$
• The null space error is given by $\norm{\matA^+\matA\vecx_0-\vecx_0}^2=\sum_{i=r+1}^n\abs{\dotp{\vecv_i}{x_0}}^2$

• The noise error is given by $\norm{\matA^+\vece}^2=\sum_{i=1}^r \frac{1}{\sigma_i^2}\abs{\dotp{\vece}{\vecu_i}}^2$

## Stable reconstruction by truncation

• How do we mitigate the effect of small singular values in reconstruction? $\hat{\vecx} = \matV\boldsymbol{\Sigma}^{-1}\matU^\intercal\vecy = \sum_{i=1}^r\frac{1}{\sigma_i}\dotp{\vecy}{\vecu_i}\vecv_i$

• Truncate the SVD to $r'<r$ $\matA_t\eqdef \sum_{i=1}^{r'}\sigma_i\vecu_i\vecv_i^\intercal\qquad\matA_t^+ = \sum_{i=1}^{r'}\frac{1}{\sigma_i}\vecu_i\vecv_i^\intercal$

• Reconstruct $\hat{\vecx_t} = \sum_{i=1}^{r'}\frac{1}{\sigma_i}\dotp{\vecy}{\vecu_i}\vecv_i=\matA_t$

• Error analysis: $\norm{\hat{\vecx}_t}^2 = \sum_{i=r+1}^n\abs{\dotp{\vecx_0}{\vecv_i}}^2+\sum_{i=r'+1}^r\abs{\dotp{\vecx_0}{\vecv_i}}^2+\sum_{i=1}^r'\frac{1}{\sigma_i^2}\abs{\dotp{\vece}{\vecu_i}}^2$

## Stable reconstruction by regularization

• Regularization means changing the problem to solve $\min_{\vecx\in\bbR^n}\norm{\vecy-\matA\vecx}^2+\lambda\norm{\vecx}^2\qquad\ \lambda>0$

• The solution is $\hat{\vecx} = (\matA^\intercal\matA+\lambda\matI)^{-1}\matA^\intercal\vecy = \matV(\boldsymbol{\Sigma}^2+\lambda\matI)^{-1}\boldsymbol{\Sigma}\matU^\intercal\vecy$