Deterministic Least Squares

Dr. Matthieu R Bloch

Tuesday August 30, 2022

Today in ECE 6555

  • Don’t forget
    1. Office hours today 12pm (in person in TSRB and on zoom)
    2. Problem set 1 posted and due Thursday September 8, 2022 on Gradescope
    3. Check out the self assessment
  • Announcements
    • Mathematics of ECE workshops (first session on linear algebra on Wednesday August 31, 2022)
  • Today’s plan
    • Normal equations
    • Deterministic least squares
    • Why? Geometric intuition is incredibly powerful and will carry over to more complex settings
  • Questions?

Normal equations

  • In most engineering problems \(\vecy = \matH\vecx\) has no solution because of (unknown) noise! \[ \vecy = \matH\vecx + \vecv \]

  • We need to introduce criteria to find the best approximate solutions

  • The least square solution \(\hat{\vecx}\in\bbR^n\) is \(\hat{\vecx}\in\bbR^n=\argmin_{\vecu}\norm[2]{\vecy-\matH\vecu}^2\)

  • One can consider other norms (\(\norm[1]{\cdot}\) for instance) but \(\norm[2]{\cdot}\) has appealing analytical properties

  • It will be convenient to introduce the cost function \(J(\vecx)\eqdef \norm[2]{\vecy-\matH\vecx}^2\)

  • A vector \(\vecx_0\) is a minimizer of \(J(\cdot)\) if and only if it satisfies the consistent normal equations \[ \matH^T\matH\vecx_0 = \matH^\intercal\vecy \] The resulting unique minimimum value is \(J(\vecx_0)=\norm[2]{\vecy}^2-\norm[2]{\matH\vecx_0}^2\)

  • Homework problem: \(\text{Im}(\matH^T\matH)=\text{Im}(\matH^T)\)

Normal equations

    • if \(\matH\) has full column rank \(n\) then \(\vecx_0 = (\matH^T\matH)^{-1}\matH^T\vecy\) is unique
    • if \(\matH\) does not have full column rank, there are multiple solutions differing by a vector in \(\text{Ker}(\matH)\)
  • When \(\matH\) is full rank note that \(\hat{\vecy}=\matH(\matH^T\matH)^{-1}\matH^T\vecy\)

  • What can we say when \(\matH\) does not have full column rank? More soon!

    • Homework: singular value decomposition

A geometric perspective

  • Recall suspicious results: \(J(\vecx_0)\eqdef\norm[2]{\vecy-\matH\vecx_0}^2=\norm[2]{\vecy}^2-\norm[2]{\matH\vecx_0}^2\) and \(\matH^T(\matH\vecx_0-\vecy)=\mathbf{0}\)

  • Let \(\calW\) be a linear subspace of \(\calV\subset\bbR^n\). An orthogonal projection of \(y\in\calV\) onto \(\calW\) is \(\hat{y}\in\calW\) such that \(y-\hat{y}\in\calW^\perp\).

  • The orthogonal projection exists and is unique

  • Let \(\calW\) be a linear subspace of \(\calV\subset\bbR^n\) and let \(y\in\calV\) with orthogonal projection of \(y\in\calV\) onto \(\calW\).

    Then \(\forall z\in\calW\), \(\norm[2]{y-\hat{y}}^2\leq\norm[2]{y-z}^2\).

  • This explains the normal equations very intuitively!

  • When \(\matH\) is full rank, the matrix \(P_\matH\eqdef \matH(\matH^T\matH)^{-1}\matH\) is the orthogonal projection matrix onto \(\text{Im}(\matH)\)

    • Homework: more on this

Geometry in action

    • Consider a full rank matrix \(\matH\in\bbR^{m\times n}\) with \(m\gg n\)

    • Let \(\hat{\vecx}^{(n)}\) be the least-square solution of \(\vecy\approx \matH\vecx\)

    • Assume we get one more input (preserving full rank) so that we want to solve \[ \vecy\approx \left[\begin{array}{cc}\matH &\vech_{n+1}\end{array}\right] \left[\begin{array}{c}\vecx\\x_{n+1}\end{array}\right] \]

      • Can we do this efficiently without recomputing everything?

Variations on a theme

  • There are many variations on the least square minimization problem

  • Weighted least squares \[ J(\vecx)\eqdef \norm[\matW]{\vecy-\matH\vecx}^2\eqdef (\vecy-\matH\vecx)^T\matW(\vecy-\matH\vecx) \] for some symmetric positive definite matrix \(\matW\)

  • Regularized least squares \[ J(\vecx)\eqdef (\vecy-\matH\vecx)^T\matW(\vecy-\matH\vecx) + (\vecx-\vecx_0)^T\mathbf{\Pi}(\vecx-\vecx_0) \] for some symmetric positive definite matrices \(\matW\) and \(\mathbf{\Pi}\)

Recursive least squares

    • Suppose at step \(i-1\) we have solved the LS problem \(\matH_{i-1}\approx \vecy_{i-1}\) with \[ \matH_{i-1}=\left[\begin{array}{ccc}-&\vech_0^T&-\\&\vdots&\\-&\vech_{i-1}^T&-\end{array}\right]\qquad \vecy_{i=1}=\left[\begin{array}{c}y_0\\\vdots\\y_{i-1}\end{array}\right] \]
    • This is different (and more interesting) than the order recursive least squares
    • We now want to solve \(\matH_{i}\approx \vecy_{i}\) with \[ \matH_{i}=\left[\begin{array}{ccc}&\matH_{i-1}&\\-&\vech_i^T&-\end{array}\right]\qquad \vecy_{i=1}=\left[\begin{array}{c}\vecy_{i-1}\\y_{i}\end{array}\right] \]
    • Can we do this efficiently?