Extended Kalman filtering

Matthieu Bloch

Tuesday October 25, 2022

Today in ECE 6555

Announcements
- Office hours today in TSRB 132 (instead of 423)
- No class on Thursday October 27, 2022 (travel)
- Drop date: October 29, 2022, by 4:00 pm
- Bonus assignment: due by November 7, 2022, only graded if it matters (look at it before the drop date!)
- Kalman filtering mini project due November 10, 2022
  - Can address questions 1-8
  - Need more concepts to do the rest, covered this week!
Last time
- Linearized Kalman filtering
Today
- Extended Kalman filtering
Questions?

Last Time: Non linear filtering

Many interesting practical system are not linear! Consider the non-linear discrete time model \[ \vecx_{i+1} = f_i(\vecx_i) + g_i(\vecx_i) \vecu_i\qquad \vecy_i = h_i(\vecx_i)+\vecv_i \] with \(f_i(\cdot)\), \(g_i(\cdot)\), \(h_i(\cdot)\), are non linear and time-varying.
\(x_0\) is assumed to be random with mean \(\bar{x}_0\) and \[ \begin{aligned} \dotp{\mat{c}{ \vecx_0-\bar{\vecx}_0\\ \vecu_i\\ \vecv_i}}{\mat{c}{ \vecx_0-\bar{\vecx}_0\\ \vecu_j\\ \vecv_j\\1}} = \mat{cccc}{\Pi_0&0&0&0\\0& \matQ_i\delta_{ij}&0&0\\0&0&\matR_i\delta_{ij}&0}. \end{aligned} \]
Question: can we leverage what we obtained for linear filtering?

Last Time: Linearized Kalman filter (LKF)

Idea: open loop linearization of state-space equations around a known nominal trajectory \(\vecx_i^\text{nom}\). \[ \vecx_0^{\text{nom}} = \vecx_0\qquad \vecx^{\text{nom}}_{i+1} = f_i(\vecx^{\text{nom}}_{i}) \]
- Nominal trajectory is deterministic
- We can write \(\vecx_i = \vecx^{\text{nom}}_{i}+\Delta \vecx_i\) where \(\Delta \vecx_i\) is random
Assume \(\set{f_i,g_i,h_i}_{i\geq 0}\) smooth enough to make first order Taylor expansion \[ f_i(\vecx_i) \approx f_i(\vecx^{\text{nom}}_{i}) + \matF_i \Delta \vecx_i\qquad h_i(\vecx_i)\approx h_i(\vecx_i^{\text{nom}}) + \matH_i \Delta \vecx_i \] with \(\matF_i\) and \(\matH_i\) defined as \[ \matF_i\eqdef \left.\frac{\partial f_i(x)}{\partial x}\right\vert_{x=\vecx^{\text{nom}}_{i}}\qquad \matH_i\eqdef \left.\frac{\partial h_i(x)}{\partial x}\right\vert_{x=\vecx^{\text{nom}}_{i}} \]
Make zeroth order approximation \(g_i(\vecx_i)\approx g_i(\vecx^{\text{nom}}_{i})\eqdef \matG_i\)

Last Time: Linearized Kalman filter (LKF)

\[ \Delta \vecx_{i+1} = \matF_i \Delta \vecx_i + \matG_i \vecu_i\qquad \vecy_i-h_i(x^{\text{nom}}_{i}) = \matH_i \Delta \vecx_i + \vecv_i \]
\[ \begin{aligned} \hat{\vecx}_{i+1|i} &= \matF_i (\hat{\vecx}_{i|i} - \vecx_i^{\text{nom}}) + f_i(\vecx_i^{\text{nom}})\\ \hat{\vecx}_{i|i} &= \hat{\vecx}_{i|i-1}+\matK_{f,i} \left(\vecy_i-h_i(\vecx_i^{\text{nom}})-\matH_i (\hat{\vecx}_{i|i-1}-\vecx_i^{\text{nom}})\right)\\ \matK_{f,i}&= \matP_{i|i-1}\matH_i^T (\matH_i\matP_{i|i-1}\matH_i^T+R_i)^{-1}\\ \matP_{i|i}&=(\matI-\matK_{f,i}\matH_i)\matP_{i|i-1}\\ \matP_{i+1|i}&=\matF_i\matP_{i|i}\matF_i^T + \matG_i Q_i \matG_i^T. \end{aligned} \]
This doesn't really work
- For small \(i\), the nominal trajectory is close to the true solution
- Quickly breaks down as noise is accumulated through \(\norm{g_i(\vecx_i)\vecu_i}\)

Extended Kalman filter (EKF)

Idea: relinearization at every step around the current estimate
\[ f_i(\vecx_i) = f_i(\hat{\vecx}_{i|i}) + \matF_i (\vecx_i-\hat{\vecx}_{i|i}),\quad h_i(\vecx_i) = h_i(\hat{\vecx}_{i|i-1}) + \matH_i (\vecx_i-\hat{\vecx}_{i|i-1}) \]

\[ g_i(\vecx_i)=g_i(\hat{\vecx}_{i|i})\eqdef \matG_i \]
\[ \vecx_{i+1}=\matF_i \vecx_i + (f_i(\hat{\vecx}_{i|i})-\matF_i\hat{\vecx}_{i|i}) + \matG_i \vecu_i\qquad \vecy_i - (h_i(\hat{\vecx}_{i|i-1})-\matH_i\hat{\vecx}_{i|i-1})=\matH_i \vecx_i + \vecv_i \]
\[ \begin{aligned} \hat{\vecx}_{i+1|i} &= f_i(\hat{\vecx}_{i|i})\\ \hat{\vecx}_{i|i} &= \hat{\vecx}_{i|i-1}+\matK_{f,i} \left(\vecy_i-h_i(\hat{\vecx}_{i|i-1})\right)\\ \matK_{f,i}&= \matP_{i|i-1}\matH_i^T (\matH_i\matP_{i|i-1}\matH_i^T+R_i)^{-1}\\ \matP_{i|i}&=(\matI-\matK_{f,i}\matH_i)\matP_{i|i-1}\\ \matP_{i+1|i}&=\matF_i\matP_{i|i}\matF_i^T + \matG_i Q_i \matG_i^T. \end{aligned} \]

Beyond Linear model

We are pushing the limits of Kalman filtering!
- EKF is based on linearization to reduce a nonlinear state space model to a linear one
- Perhaps most widely used Kalman filter
- Sensitive to tuning and how linearization is performed
At this stage it helps to take a step back
- Kalman filtering is actually a special case of optimal filtering for probabilistic state space models
- We will revisit Kalman filtering within that broader context
- This will help understand EKF and other approaches (UKF, CKF, particle filtering) more easily
We're going to switch gears and do a bit more probabilities (and less linear algebra)

Probabilistic State Space Model

A probabilistic state space model consists of a state evolution and measurement model \[ x_{i+1} \sim p(x_{i+1}|x_{0:i}y_{0:i})\qquad y_i \sim p(y_i|x_{0:i}y_{0:i-1}) \] where \(x_i^\intercal = \mat{ccc}{x_{i,1}&\cdots&x_{i,n}}\) is the state and \(y_i= \mat{ccc}{y_{i,1}&\cdots&y_{i,m}}\) is the measurement.

We define \(x_{0:i}\eqdef \mat{ccc}{x_{0}&\cdots&x_{i}}\).

The dynamic model is Markovian if \(p(x_{i+1}|x_{0:i}y_{0:i}) = p(x_{i+1}|x_i)\). The measurement model satisfies conditional independence if \(p(y_i|x_{0:i}y_{0:i-1}) = p(y_i|x_i)\)
From now on, unless otherwise specified, we assume Markovianity and conditional independence hold

Illustration: functional dependence graphs, hidden Markov model

Functional dependence graphs

Consider \(m\) independent random variables and \(n\) functions of these variables. A functional dependence graph is a directed graph having \(m + n\) vertices, and in which edges are drawn from one vertex to another if the random variable of the former vertex is an argument in the function defining the latter.
Simple measurement model \(y = x + n\)

Let \(x,y\) be jointly distributed random variables with a well-defined PMF or PDF. There exists a random variable \(n\) independent of \(x\) and a function \(f\) such that \(x=f(x,n)\).

Functional dependence graphs

Let \(\calX\), \(\calY\), and \(\calZ\) be disjoint subsets of vertices in a functional dependence graph \(\mathcal{G}\). The subset \(\calZ\) is said to d-separate \(\calX\) from \(\calY\) if there exists no path between a vertex of X and a vertex of Y after the following operations have been performed:
1. construct the subgraph \(\mathcal{G}′\) consisting of all vertices in \(\calX\),\(\calY\),and \(\calZ\), as well as the edges and vertices encountered when moving backward starting from any of the vertices in \(\calX\), \(\calY\), or \(\calZ\);
2. in the subgraph \(\mathcal{G}'\), delete all edges coming out of \(\calZ\);
3. remove all arrows in \(\mathcal{G}'\) to obtain an undirected graph.

Let \(\calX\), \(\calY\), and \(\calZ\) be disjoint subsets of vertices in a functional dependence graph. If \(\calZ\) d-separates \(\calX\) from \(\calY\), and if we collect the random variables in \(\calX\), \(\calY\), and \(\calZ\) in the random vectors \(x\), \(y\), and \(z\), respectively, then \(x\rightarrow y\rightarrow z\) forms a Markov chain (\(x\) and \(z\) are conditionally independent given \(y\))

In the probabilistic state space model, past states \(x_{0:i-1}\) are independent of the future states \(x_{i+1:T}\) and measurements \(y_{i:T}\) given the present state \(x_i\).

Bayesian optimal filtering

Bayesian optimal filtering consists in computing the distribution \(p(x_i|y_{0:i})\) (compare to \(\hat{x}_{i|i}\)) given
1. Prior distribution \(p(x_0)\)
2. Probabilistic state space model
3. Measurement sequence \(y_{0:i}\)
1. Initialization: start from known \(p(x_0)\)
2. For \(i\geq 1\)
  1. Prediction: compute \(p(x_i|y_{0:i-1})\) by Chapman-Kolmogorov equation \[ p(x_i|y_{0:i-1})= \int p(x_i|x_{i-1})p(x_{i-1}|y_{0:i-1})dx_{i-1} \]
  2. Update: compute \(p(x_i|y_{0:i})\) by Bayes' rule \[ p(x_i|y_{0:i})= \frac{1}{Z_i}p(y_i|x_i)p(x_i|y_{0:i-1})\textsf{ with } Z_i= \int p(y_i|x_i)p(x_i|y_{0:i-1})dx_i \]

Special case: Gauss-Markov model

A Gauss-Markov model is a Gaussian driven linear model \[ \vecx_{i+1} = \matF_i \vecx_i + \vecu_i\qquad y_i = \matH_i \vecx_i +\vecv_i \] where \(\vecu_i\sim\calN(0,\matQ_i)\) and \(\vecv_i\sim\calN(0,\matR_i)\) are Gaussian white processes (assumed independent).
We assume that all variables are real-valued for simplicity
Let \(x\) and \(y\) be jointly distributed random variables \[ \mat{c}{x\\y}\sim\calN\left(\mat{c}{\mu_x\\\mu_y},\mat{cc}{\matR_x&\matR_{xy}\\\matR_{yx}&\matR_y}\right) \] Then \(x\sim\calN(\mu_x,\matR_x)\), \(y\sim\calN(\mu_y,\matR_y)\) and \[ \begin{aligned} x|y&\sim\calN(\mu_x+\matR_{xy}\matR_y^{-1}(y-\mu_y),\matR_x-\matR_{xy}\matR_y^{-1}\matR_{yx})\\ y|x&\sim\calN(\mu_y+\matR_{yx}\matR_x^{-1}(x-\mu_x),\matR_y-\matR_{yx}\matR_x^{-1}\matR_{xy}) \end{aligned} \]

Kalman filter revisited

All distribution are jointly Gaussian in a Gauss-Markov model.

The Kalman filter is the Bayesian optimal filter of the Gauss Markov model
1. Initialization: \(\vecx_0\sim\calN(\hat{\vecx}_{0|-1},\matP_0)\)
2. Prediction: \(\vecx_{i|i-1}\sim\calN(\hat{\vecx}_{i|i-1},\matP_{i|i-1})\) with
\[ \hat{\vecx}_{i|i-1} = \matF_{i-1} \hat{\vecx}_{i-1|i-1}\qquad \matP_{i|i-1} = \matF_{i-1} \matP_{i-1|i-1}\matF_{i-1}^\intercal + \matQ_{i-1} \]
1. Update: \(\vecx_{i|i}\sim\calN(\hat{\vecx}_{i|i},\matP_{i|i})\) with
\[ \begin{aligned} \hat{\vecx}_{i|i} &= \hat{\vecx}_{i|i-1}+\matK_{f,i}(\vecy_i-\matH_i\hat{\vecx}_{i|i-1})\\ \matK_{f,i} &= \matP_{i|i-1}\matH_i^\intercal(\matH_i\matP_{i|i-1}\matH_i^\intercal+\matQ_i)^{-1}\\ \matP_{i|i}&= \matP_{i|i-1}-\matK_{f,i}\matH_i \matP_{i|i-1} \end{aligned} \]
Remark: our earlier analysis did not assume Gaussian distribution