Estimation of Stochastic Processes

Matthieu Bloch

Tuesday, September 20, 2022

Today in ECE 6555

Don't forget
- Problem set 2 due Thursday September 22, 2022 on Gradescope
- Mathematics of ECE workshops website
- Recordings available on mediaspace
Last time
- Sensor fusion
- Stochastic processes: smoothing, causal filtering, prediction
Today's plan
- Causal filtering
- Innovation process
Questions?

Estimation model:
- Signal process \(\set{x_i}_{i\geq 0}\) not observed, zero mean
- Measurement process \(\set{y_i}_{i\geq 0}\) obverved, zero mean
- Covariance and correlation matrices known: \(\matR_{xy}(i,\ell)\eqdef \E{s_i y_\ell^T}\), \(\matR_{y}(i,\ell)\eqdef \E{y_i y_\ell^T}\), \(\forall i,\ell\)
Estimation goal: need to specify form of estimation and which observations to use

Smoothing: estimate \(x_i\) from \(\set{y_j}_{j=0}^m\), \(m>i\) (using past, present and future observations) as \[ \hat{x}_{i|m} \eqdef \sum_{j=0}^{m} k_{i,j}y_j \]
Causal filtering: estimate \(x_i\) from \(\set{y_j}_{j=0}^{i}\) (using past, and present observations) as \[ \hat{x}_{i|i} \eqdef \sum_{j=0}^{i} k_{i,j}y_j \]
Prediction: estimate \(x_{i+\ell}\) from \(\set{y_j}_{j=0}^{i}\), \(\ell>1\) (using past observations) as \[ \hat{x}_{i+\ell|i} \eqdef \sum_{j=0}^{i} k_{i,j}y_j \]
In all cases we want the estimation to be optimal (minimize error covariance matrix)

Let's put what we've learned to work: geometry!
Smoothing reduces to solving the normal equations and for \(\matR_y\succ 0\) \[ \hat{\vecx}_{s} = \matR_{\vecx\vecy}\matR_{\vecy}^{-1}\vecy \] where \[ \hat{\vecx}_{s}\eqdef\left[\begin{array}{c}\hat{x}_{0|m}\\\vdots\\\hat{x}_{m|m}\end{array}\right]\quad \matR_{\vecy}\eqdef\left[\matR_y(i,j)\right]\quad \matR_{\vecx\vecy}\eqdef\left[\matR_{xy}(i,j)\right] \]

For \(\matR_\vecy\succ 0\) decomposed as \(\matR_\vecy=\matL\matD\matL^T\) (\(\matL\) lower triangular) \[ \hat{\vecx}_{f} = \mathcal{L}\left[\matR_{\vecx\vecy}\matL^T\matD^{-1}\right]\matL^{-1}\vecy \] where \[ \hat{\vecx}_{f}\eqdef\left[\begin{array}{c}\hat{x}_{0|0}\\\hat{x}_{1|1}\\\vdots\\\hat{x}_{m|m}\end{array}\right]\quad \matR_{\vecy}\eqdef\left[\matR_y(i,j)\right]\quad \matR_{\vecx\vecy}\eqdef\left[\matR_{xy}(i,j)\right] \] and \(\mathcal{L}[\cdot]\) is the operator that makes a matrix lower triangular.
Example: Linear model \(\vecy = \vecx+\vecv\) with \(\E{\vecx\vecx^T}\eqdef \matR_x\), \(\E{\vecv\vecv^T}\eqdef \matR_v\), \(\E{\vecx\vecv^T}\eqdef 0\)

A key difficulty we create is that \(\matR_y\) need to be inverted
- Would be easier if \(\matR_y\) were diagonal (which in general it has no reason to do)
The normal equation are obtained by projecting onto a subspace
- We are not bound to use \(\set{\vecy_i}_{i=0}^m\): we can orthogonalize!
Gram-Schimdt orthogonalization for random variables \[ \vece_0 = \vecy_0\qquad\qquad \forall i \geq 1\quad \vece_i = \vecy_i-\underbrace{\sum_{j=0}^{i-1}\dotp{\vecy_i}{\vece_j}\norm{\vece_j}^{-2}\vece_j}_{\hat{\vecy}_i} \]
The random variable \(\vece_i\eqdef \vecy_i-\hat{\vecy}_i\) is called the innovation