Stability and Numerical Aspects of Least Squares

Logistics

General announcements
- Assignment 6 to be posted… (grading traffic jam)
- 4 lectures left! (No lecture on Wednesday November 24, 2021)
Midterm 2
- 99% graded, grades released after Thanksiving weekend
- Midterm solution during office hours on Tuesday November 23, 2021
Assignment 5
- Grading finalized
Assignment 6 and 7
- Posted this week, due date on Monday December 6 and Monday December 13

What if we observe $y = A x_{0} + e$ and we apply the pseudo inverse? $\hat{x} = A^{+} y$
We can separate the error analysis into two components $\hat{x} - x_{0} = \underset{null space error}{\underset{⏟}{A^{+} A x_{0} - x_{0}}} + \underset{noise error}{\underset{⏟}{A^{+} e}}$
We will express the error in terms of the SVD $A = U Σ V^{⊺}$ With
- ${v_{i}}_{i = 1}^{r}$ orthobasis of $row (A)$ , augmented by ${v_{i}}_{i = 1}^{r + 1} \in \ker A$ to form an orthobasis of $R^{n}$
- ${u_{i}}_{i = 1}^{r}$ orthobasis of $col (A)$ , augmented by ${u}_{i = 1}^{r + 1} \in \ker A^{⊺}$ to form an orthobasis of $R^{m}$
The null space error is given by ${‖ A^{+} A x_{0} - x_{0} ‖}_{2}^{2} = \sum_{i = r + 1}^{n} {| {⟨ v_{i}, x_{0} ⟩}_{} |}^{2}$
The noise error is given by ${‖ A^{+} e ‖}_{2}^{2} = \sum_{i = 1}^{r} \frac{1}{σ_{i}^{2}} {| {⟨ e, u_{i} ⟩}_{} |}^{2}$

How do we mitigate the effect of small singular values in reconstruction? $\hat{x} = V Σ^{- 1} U^{⊺} y = \sum_{i = 1}^{r} \frac{1}{σ_{i}} {⟨ y, u_{i} ⟩}_{} v_{i}$
Truncate the SVD to $r^{'} < r$ $A_{t} ≜ \sum_{i = 1}^{r^{'}} σ_{i} u_{i} v_{i}^{⊺} A_{t}^{+} = \sum_{i = 1}^{r^{'}} \frac{1}{σ_{i}} u_{i} v_{i}^{⊺}$
Reconstruct $\hat{x_{t}} = \sum_{i = 1}^{r^{'}} \frac{1}{σ_{i}} {⟨ y, u_{i} ⟩}_{} v_{i} = A_{t}$
Error analysis: ${‖ {\hat{x}}_{t} - x_{0} ‖}_{2}^{2} = \sum_{i = r + 1}^{n} {| {⟨ x_{0}, v_{i} ⟩}_{} |}^{2} + \sum_{i = r^{'} + 1}^{r} {| {⟨ x_{0}, v_{i} ⟩}_{} |}^{2} + {\sum_{i = 1}^{r}}^{'} \frac{1}{σ_{i}^{2}} {| {⟨ e, u_{i} ⟩}_{} |}^{2}$

Regularization means changing the problem to solve $min_{x \in R^{n}} {‖ y - A x ‖}_{2}^{2} + λ {‖ x ‖}_{2}^{2} λ > 0$
The solution is $\hat{x} = (A^{⊺} A + λ I)^{- 1} A^{⊺} y = V (Σ^{2} + λ I)^{- 1} Σ U^{⊺} y$

We have seen several solutions to systems of linear equations $A x = y$ so far
- $A$ full column rank: $\hat{x} = (A^{⊺} A)^{- 1} A^{⊺} y$
- $A$ full row rank: $\hat{x} = A^{⊺} (A A^{⊺})^{- 1} y$
- Ridge regression: $\hat{x} = (A^{⊺} A + δ I)^{- 1} A^{⊺} y$
- Kernel regression: $\hat{x} = (K + δ I)^{- 1} y$
- Ridge regression in Hilbert space: $\hat{x} = (A^{⊺} A + δ G)^{- 1} A^{⊺} y$
Extension: constrained least-squares $min_{x \in R^{n}} {‖ y - A x ‖}_{2}^{2} s.t. x = B α for some α$
- The solution is $\hat{x} = B (B^{⊺} A^{⊺} A B)^{- 1} B^{⊺} A^{⊺} y$
All these problems involve a symmetric positive definite system of equations.
- Many methods to achieve this based on matrix factorization

Diagonal system
- $A \in R^{n \times n}$ invertible and diagonal
- $O (n)$ complexity
Orthogonal system
- $A \in R^{n \times n}$ invertible and orthogonal
- $O (n^{2})$ complexity
Lower triangular system
- $A \in R^{n \times n}$ invertible and lower diagonal
- $O (n^{2})$ complexity
General strategy: factorize $A$ to recover some of the structures above