Deterministic Least Squares

Dr. Matthieu R Bloch

Tuesday August 30, 2022

Today in ECE 6555

Don’t forget
1. Office hours today 12pm (in person in TSRB and on zoom)
2. Problem set 1 posted and due Thursday September 8, 2022 on Gradescope
3. Check out the self assessment
Announcements
- Mathematics of ECE workshops (first session on linear algebra on Wednesday August 31, 2022)
Today’s plan
- Normal equations
- Deterministic least squares
- Why? Geometric intuition is incredibly powerful and will carry over to more complex settings
Questions?

In most engineering problems \(\vecy = \matH\vecx\) has no solution because of (unknown) noise! \[ \vecy = \matH\vecx + \vecv \]
We need to introduce criteria to find the best approximate solutions
The least square solution \(\hat{\vecx}\in\bbR^n\) is \(\hat{\vecx}\in\bbR^n=\argmin_{\vecu}\norm[2]{\vecy-\matH\vecu}^2\)
One can consider other norms (\(\norm[1]{\cdot}\) for instance) but \(\norm[2]{\cdot}\) has appealing analytical properties
It will be convenient to introduce the cost function \(J(\vecx)\eqdef \norm[2]{\vecy-\matH\vecx}^2\)
A vector \(\vecx_0\) is a minimizer of \(J(\cdot)\) if and only if it satisfies the consistent normal equations \[ \matH^T\matH\vecx_0 = \matH^\intercal\vecy \] The resulting unique minimimum value is \(J(\vecx_0)=\norm[2]{\vecy}^2-\norm[2]{\matH\vecx_0}^2\)
Homework problem: \(\text{Im}(\matH^T\matH)=\text{Im}(\matH^T)\)

- if \(\matH\) has full column rank \(n\) then \(\vecx_0 = (\matH^T\matH)^{-1}\matH^T\vecy\) is unique
- if \(\matH\) does not have full column rank, there are multiple solutions differing by a vector in \(\text{Ker}(\matH)\)
When \(\matH\) is full rank note that \(\hat{\vecy}=\matH(\matH^T\matH)^{-1}\matH^T\vecy\)
What can we say when \(\matH\) does not have full column rank? More soon!
- Homework: singular value decomposition

Recall suspicious results: \(J(\vecx_0)\eqdef\norm[2]{\vecy-\matH\vecx_0}^2=\norm[2]{\vecy}^2-\norm[2]{\matH\vecx_0}^2\) and \(\matH^T(\matH\vecx_0-\vecy)=\mathbf{0}\)
Let \(\calW\) be a linear subspace of \(\calV\subset\bbR^n\). An orthogonal projection of \(y\in\calV\) onto \(\calW\) is \(\hat{y}\in\calW\) such that \(y-\hat{y}\in\calW^\perp\).
The orthogonal projection exists and is unique
Let \(\calW\) be a linear subspace of \(\calV\subset\bbR^n\) and let \(y\in\calV\) with orthogonal projection of \(y\in\calV\) onto \(\calW\).

Then \(\forall z\in\calW\), \(\norm[2]{y-\hat{y}}^2\leq\norm[2]{y-z}^2\).
This explains the normal equations very intuitively!
When \(\matH\) is full rank, the matrix \(P_\matH\eqdef \matH(\matH^T\matH)^{-1}\matH\) is the orthogonal projection matrix onto \(\text{Im}(\matH)\)
- Homework: more on this

- Consider a full rank matrix \(\matH\in\bbR^{m\times n}\) with \(m\gg n\)
- Let \(\hat{\vecx}^{(n)}\) be the least-square solution of \(\vecy\approx \matH\vecx\)
- Assume we get one more input (preserving full rank) so that we want to solve \[ \vecy\approx \left[\begin{array}{cc}\matH &\vech_{n+1}\end{array}\right] \left[\begin{array}{c}\vecx\\x_{n+1}\end{array}\right] \]
  - Can we do this efficiently without recomputing everything?

There are many variations on the least square minimization problem
Weighted least squares \[ J(\vecx)\eqdef \norm[\matW]{\vecy-\matH\vecx}^2\eqdef (\vecy-\matH\vecx)^T\matW(\vecy-\matH\vecx) \] for some symmetric positive definite matrix \(\matW\)
Regularized least squares \[ J(\vecx)\eqdef (\vecy-\matH\vecx)^T\matW(\vecy-\matH\vecx) + (\vecx-\vecx_0)^T\mathbf{\Pi}(\vecx-\vecx_0) \] for some symmetric positive definite matrices \(\matW\) and \(\mathbf{\Pi}\)

- Suppose at step \(i-1\) we have solved the LS problem \(\matH_{i-1}\approx \vecy_{i-1}\) with \[ \matH_{i-1}=\left[\begin{array}{ccc}-&\vech_0^T&-\\&\vdots&\\-&\vech_{i-1}^T&-\end{array}\right]\qquad \vecy_{i=1}=\left[\begin{array}{c}y_0\\\vdots\\y_{i-1}\end{array}\right] \]
- This is different (and more interesting) than the order recursive least squares
- We now want to solve \(\matH_{i}\approx \vecy_{i}\) with \[ \matH_{i}=\left[\begin{array}{ccc}&\matH_{i-1}&\\-&\vech_i^T&-\end{array}\right]\qquad \vecy_{i=1}=\left[\begin{array}{c}\vecy_{i-1}\\y_{i}\end{array}\right] \]
- Can we do this efficiently?