Singular value decomposition

Logistics

General announcements
- Assignment 6 to be posted… (grading traffic jam)
- 7 lectures left!
Assignment 5 and Midterm 2:
- Grading starting, we’ll keep you posted

What happens for non-square matrice?
Let $A \in R^{m \times n}$ with $rank (A) = r$ . Then $A = U Σ V^{T}$ where
- $U \in R^{m \times r}$ such that $U^{⊺} U = I_{r}$ (orthonormal columns)
- $V \in R^{n \times r}$ such that $V^{⊺} V = I_{r}$ (orthonormal columns)
- $Σ \in R^{r \times r}$ is diagonal with positive entries $Σ ≜ [\begin{array}{cccc} σ_{1} & 0 & 0 & \dots \\ 0 & σ_{2} & 0 & \dots \\ ⋮ & ⋱ \\ 0 & \dots & \dots & σ_{r} \end{array}]$ and $σ_{1} \geq σ_{2} \geq \dots \geq σ_{r} > 0$ . The $σ_{i}$ are called the singular values

We say that $A$ is full rank is $r = min (m, n)$

We can write $A = \sum_{i = 1}^{r} σ_{i} u_{i} v_{i}^{⊺}$

The columns of $V$ ${v_{i}}_{i = 1}^{r}$ are eigenvectors of the psd matrix $A^{⊺} A$ . ${σ_{i} : 1 \leq i \leq n and σ_{i} \neq 0}$ are the square roots of the non-zero eigenvalues of $A^{⊺} A$ .
The columns of $U$ ${u_{i}}_{i = 1}^{r}$ are eigenvectors of the psd matrix $A A^{⊺}$ . ${σ_{i} : 1 \leq i \leq n and σ_{i} \neq 0}$ are the square roots of the non-zero eigenvalues of $A A^{⊺}$ .
The columns of $V$ form an orthobasis for $row (A)$
The columns of $U$ form an orthobasis for $col (A)$
Equivalent form of the SVD: $A = \tilde{U} \tilde{Σ} {\tilde{V}}^{T}$ where
- $\tilde{U} \in R^{m \times m}$ is orthonormal
- $\tilde{V} \in R^{n \times n}$ is orthonormal
- $\tilde{Σ} \in R^{m \times n}$ is
$\tilde{Σ} ≜ [\begin{array}{cc} Σ & 0 \\ 0 & 0 \end{array}]$

When we cannot solve $y = A x$ , we solve instead $min_{x \in R^{n}} {‖ x ‖}_{2}^{2} such that A^{⊺} A x = A^{⊺} y$
- This allows us to pick the minimum norm solution among potentially infinitely many solutions of the normal equations.
Recall: when $A \in R^{m \times n}$ is of rank $n$ , then $x = A^{⊺} (A A^{⊺})^{- 1} y$
The solution of $min_{x \in R^{n}} {‖ x ‖}_{2}^{2} such that A^{⊺} A x = A^{⊺} y$ is $\hat{x} = V Σ^{- 1} U^{⊺} y = \sum_{i = 1}^{r} \frac{1}{σ_{i}} {⟨ y, u_{i} ⟩}_{} v_{i}$ where $A = U Σ V^{T}$ is the SVD of $A$ .

$A^{+} = V Σ^{- 1} U^{⊺}$ is called the pseudo-inverse, Lanczos inverse, or Moore-Penrose inverse of $A = U Σ V^{T}$ .
If $A$ is square invertible then $A^{+} = A$
If $m \geq n$ (tall and skinny matrix) of rank $n$ then $A^{+} = (A^{⊺} A)^{- 1} A^{⊺}$
If $m \geq m$ (short and fat matrix) of rank $m$ then $A^{+} = A^{⊺} (A A^{⊺})^{- 1}$
Note $A^{+}$ is as “close” to an inverse of $A$ as possible

What if we observe $y = A x_{0} + e$ and we apply the pseudo inverse? $\hat{x} = A^{+} y$
We can separate the error analysis into two components $\hat{x} - x_{0} = \underset{null space error}{\underset{⏟}{A^{+} A x_{0} - x_{0}}} + \underset{noise error}{\underset{⏟}{A^{+} e}}$
We will express the error in terms of the SVD $A = U Σ V^{⊺}$ With
- ${v_{i}}_{i = 1}^{r}$ orthobasis of $row (A)$ , augmented by ${v_{i}}_{i = 1}^{r + 1} \in \ker A$ to form an orthobasis of $R^{n}$
- ${u_{i}}_{i = 1}^{r}$ orthobasis of $col (A)$ , augmented by ${u}_{i = 1}^{r + 1} \in \ker A^{⊺}$ to form an orthobasis of $R^{m}$
The null space error is given by ${‖ A^{+} A x_{0} - x_{0} ‖}_{2}^{2} = \sum_{i = r + 1}^{n} {| {⟨ v_{i}, x_{0} ⟩}_{} |}^{2}$
The noise error is given by ${‖ A^{+} e ‖}_{2}^{2} = \sum_{i = 1}^{r} \frac{1}{σ_{i}^{2}} {| {⟨ e, u_{i} ⟩}_{} |}^{2}$

How do we mitigate the effect of small singular values in reconstruction? $\hat{x} = V Σ^{- 1} U^{⊺} y = \sum_{i = 1}^{r} \frac{1}{σ_{i}} {⟨ y, u_{i} ⟩}_{} v_{i}$
Truncate the SVD to $r^{'} < r$ $A_{t} ≜ \sum_{i = 1}^{r^{'}} σ_{i} u_{i} v_{i}^{⊺} A_{t}^{+} = \sum_{i = 1}^{r^{'}} \frac{1}{σ_{i}} u_{i} v_{i}^{⊺}$
Reconstruct $\hat{x_{t}} = \sum_{i = 1}^{r^{'}} \frac{1}{σ_{i}} {⟨ y, u_{i} ⟩}_{} v_{i} = A_{t}$
Error analysis: ${‖ {\hat{x}}_{t} ‖}_{2}^{2} = \sum_{i = r + 1}^{n} {| {⟨ x_{0}, v_{i} ⟩}_{} |}^{2} + \sum_{i = r^{'} + 1}^{r} {| {⟨ x_{0}, v_{i} ⟩}_{} |}^{2} + {\sum_{i = 1}^{r}}^{'} \frac{1}{σ_{i}^{2}} {| {⟨ e, u_{i} ⟩}_{} |}^{2}$

Regularization means changing the problem to solve $min_{x \in R^{n}} {‖ y - A x ‖}_{2}^{2} + λ {‖ x ‖}_{2}^{2} λ > 0$
The solution is $\hat{x} = (A^{⊺} A + λ I)^{- 1} A^{⊺} y = V (Σ^{2} + λ I)^{- 1} Σ U^{⊺} y$

We have seen several solutions to systems of linear equations $A x = y$ so far
- $A$ full column rank: $\hat{x} = (A^{⊺} A)^{- 1} A^{⊺} y$
- $A$ full row rank: $\hat{x} = A^{⊺} (A A^{⊺})^{- 1} y$
- Ridge regression: $\hat{x} = (A^{⊺} A + δ I)^{- 1} A^{⊺} y$
- Kernel regression: $\hat{x} = (K + δ I)^{- 1} y$
- Ridge regression in Hilbert space: $\hat{x} = (A^{⊺} A + δ G)^{- 1} A^{⊺} y$
Extension: constrained least-squares $min_{x \in R^{n}} {‖ y - A x ‖}_{2}^{2} s.t. x = B α for some α$
- The solution is $\hat{x} = B (B^{⊺} A^{⊺} A B)^{- 1} B^{⊺} A^{⊺} y$
All these problems involve a symmetric positive definite system of equations.
- Many methods to achieve this based on matrix factorization

Diagonal system
- $A \in R^{n \times n}$ invertible and diagonal
- $O (n)$ complexity
Orthogonal system
- $A \in R^{n \times n}$ invertible and orthogonal
- $O (n^{2})$ complexity
Lower triangular system
- $A \in R^{n \times n}$ invertible and lower diagonal
- $O (n^{2})$ complexity
General strategy: factorize $A$ to recover some of the structures above