The Mathematics of ECE

Probabilities - Estimation

Wednesday, September 08, 2021

Probabilities: roadmap

  • Last time: review of foundations and concentration of measure
    • Key concepts: Conditional distributions, conditional expectation
    • Key results: Chebyshev, Hoeffing
    • Useful in signal processing, information theory, machine learning
  • Today: estimation - given \(y\) what is \(x\)??
    • Key concepts: conditional expectation, minimum mean square estimation
    • Key result: orthogonality principle
    • Useful in signal processing, robotics, machine learning

Estimation

  • Assume that \(x\in\bbR^p\) and \(y\in\bbR^q\) are dependent random vectors with known \(p_{xy}\)
    • cardinal sin: we are using lowercase for random vectors
    • This is very typical in controls and robotics
  • Objective: estimate \(x\) from \(y\)
    • Think \(y=x+\text{noise}\), a sensor measurement
    • We want to form \(\hat{x}=h(y)\) with some estimator \(h:\bbR^q\to\bbR^p\)
    • We need a measure of performance
  • The least mean square (LMS) estimator \(h^*\) is \[ h^*=\argmin_h\underbrace{\E{(x-h(y))(x-h(y))^\intercal}}_{\text{matrix}} \] in the sense that \(\forall h\) \(\forall a\in\bbR\) \(a^\intercal P(h) a \geq a^\intercal P(h^*) a\). \(\hat{x}\eqdef h^*(y)\) is called the minimum mean square estimate (MMSE) of \(x\) given y.
  • Make sure you understand where the matrices and vectors are!

LMS solution

  • The LMS estimate of \(x\) given \(y\) is \(\hat{x}=\E{x|y}\).
  • This is nice, it looks simpleā€¦ but we need to know \(p_{xy}\)

  • That could be hard in practice

  • Consider jointly distributed real-valued zero mean Gaussian random vectors \(x\) and \(y\) with non singular covariance matrix \[ R\eqdef\mat{cc}{R_x&R_{xy}\\R_{yx}&R_y}\text{ where } R_x\eqdef\E{xx^\intercal}, R_y\eqdef\E{yy^\intercal}, R_{xy}=R_{yx}^\intercal\eqdef\E{xy^\intercal} \] The MMSE is \(\hat{x} = R_{xy}R_y^{-1}y\) and is linear

Linear least mean square estimate

  • For simplicity, we might want to restrict ourselves to linear estimates of the form \(\hat{x}\eqdef K_0 y\).

  • Assume \(x\in\bbR^n\) \(y\in\bbR^p\) are zero mean random variables. The LLMS estimate of \(x\) given \(y\) is of the form \(\hat{x}=K_0y\) with \(K_0\) solution for the normal equation \[ K_0R_y = R_{xy}\text{ where }R_y = \E{yy^\intercal}, R_xy=\E{xy^\intercal} \] The corresponding error covariance matrix is \(P(K_0)=R_x-K_0R_{yx}\)
  • If \(R_y>0\), we have \(K_0=R_{xy}R_y^{-1}\)

  • We only need to know second order statistics of \(x\) and \(y\)

  • Question: how do we deal with non-zero mean?

Geometric view

  • The LMS solution is such that \(K_0R_y = R_{xy}\), equivalently \[ \E{(x-K_0y)y^\intercal} = 0 \]

  • This could be viewed as an orthogonality condition (linear algebra!!!!)

  • For centered (zero-mean) random variables, define \(\dotp{x}{y}\eqdef{\E{xy^\intercal}}\)

    • This is linear
    • This is symmetric
    • This is positive
  • The LLMS estimate of \(x\) given \(y\) is characterized by the fact that the error \(\tilde{x}\eqdef x-\hat{x}\) is orthogonal (uncorrelated) to the observation \(y\). Equivalently, the LLMS estimate is the projection of \(x\) onto the linear space spanned by \(y\).