More Kalman filtering

Matthieu Bloch

Thursday, October 20, 2022

Today in ECE 6555

Announcements
- Office hours next Tuesday October 25, 2022 in TSRB 132 (instead of 432)
- Midterm graded - HW grades coming up
- Drop date: October
- Bonus assignment: due by drop date, only graded if it matters
- Kalman filtering mini project!
Last time
- Time/Measurement update
- Quite a few simplifications if we avoid corner cases of singlular matrices
Today's plan
- Linearized Kalman filtering, Extended Kalman filtering
Questions?

Non linear filtering

Many interesting practical system are not linear! Consider the non-linear discrete time model \[ x_{i+1} = f_i(x_i) + g_i(x_i) u_i\qquad y_i = h_i(x_i)+v_i \] with \(f_i(\cdot)\), \(g_i(\cdot)\), \(h_i(\cdot)\), are non linear and time-varying.
\(x_0\) is assumed to be random with mean \(\bar{x}_0\) and \[ \begin{aligned} \dotp{\mat{c}{ x_0-\bar{x}_0\\ u_i\\ v_i}}{\mat{c}{ x_0-\bar{x}_0\\ u_j\\ v_j\\1}} = \mat{cccc}{\Pi_0&0&0&0\\0& Q_i\delta_{ij}&0&0\\0&0&R_i\delta_{ij}&0}. \end{aligned} \]
Question: can we leverage what we obtained for linear filtering?

Linearized Kalman filter (LKF)

Idea: open loop linearization of state-space equations around a known nominal trajectory \(x_i^\text{nom}\). \[ x_0^{\text{nom}} = x_0\qquad x^{\text{nom}}_{i+1} = f_i(x^{\text{nom}}_{i}) \]
- Nominal trajectory is deterministic
- We can write \(x_i = x^{\text{nom}}_{i}+\Delta x_i\) where \(\Delta x_i\) is random
Assume \(\set{f_i,g_i,h_i}_{i\geq 0}\) smooth enough to make first order Taylor expansion \[ f_i(x_i) \approx f_i(x^{\text{nom}}_{i}) + F_i \Delta x_i\qquad h_i(x_i)\approx h_i(x_i^{\text{nom}}) + H_i \Delta x_i \] with \(F_i\) and \(H_i\) defined as \[ F_i\eqdef \left.\frac{\partial f_i(x)}{\partial x}\right\vert_{x=x^{\text{nom}}_{i}}\qquad H_i\eqdef \left.\frac{\partial h_i(x)}{\partial x}\right\vert_{x=x^{\text{nom}}_{i}} \]
Make zeroth order approximation \(g_i(x_i)\approx g_i(x^{\text{nom}}_{i})\eqdef G_i\)

Linearized Kalman filter (LKF)

\[ \Delta x_{i+1} = F_i \Delta x_i + G_i u_i\qquad y_i-h_i(x^{\text{nom}}_{i}) = H_i \Delta x_i + v_i \]
\[ \begin{aligned} \hat{x}_{i+1|i} &= F_i (\hat{x}_{i|i} - x_i^{\text{nom}}) + f_i(x_i^{\text{nom}})\\ \hat{x}_{i|i} &= \hat{x}_{i|i-1}+K_{f,i} \left(y_i-h_i(x_i^{\text{nom}})-H_i (\hat{x}_{i|i-1}-x_i^{\text{nom}})\right)\\ K_{f,i}&= P_{i|i-1}H_i^T (H_iP_{i|i-1}H_i^T+R_i)^{-1}\\ P_{i|i}&=(I-K_{f,i}H_i)P_{i|i-1}\\ P_{i+1|i}&=F_iP_{i|i}F_i^T + G_i Q_i G_i^T. \end{aligned} \]
This doesn't really work
- For small \(i\), the nominal trajectory is close to the true solution
- Quickly breaks down as noise is accumulated through \(\norm{g_i(x_i)u_i}\)

Extended Kalman filter (EKF)

Idea: relinearization at every step around the current estimate
\[ f_i(x_i) = f_i(\hat{x}_{i|i}) + F_i (x_i-\hat{x}_{i|i}),\quad h_i(x_i) = h_i(\hat{x}_{i|i-1}) + H_i (x_i-\hat{x}_{i|i-1}),\quad g_i(x_i)=g_i(\hat{x}_{i|i})\eqdef G_i \]
\[ x_{i+1}=F_i x_i + (f_i(\hat{x}_{i|i})-F_i\hat{x}_{i|i}) + G_i u_i\qquad y_i - (h_i(\hat{x}_{i|i-1})-H_i\hat{x}_{i|i-1})=H_i x_i + v_i \]
\[ \begin{aligned} \hat{x}_{i+1|i} &= f_i(\hat{x}_{i|i})\\ \hat{x}_{i|i} &= \hat{x}_{i|i-1}+K_{f,i} \left(y_i-h_i(\hat{x}_{i|i-1})\right)\\ K_{f,i}&= P_{i|i-1}H_i^T (H_iP_{i|i-1}H_i^T+R_i)^{-1}\\ P_{i|i}&=(I-K_{f,i}H_i)P_{i|i-1}\\ P_{i+1|i}&=F_iP_{i|i}F_i^T + G_i Q_i G_i^T. \end{aligned} \]

Beyond Linear model

We are pushing the limits of Kalman filtering!
- EKF is based on linearization to reduce a nonlinear state space model to a linear one
- Perhaps most widely used Kalman filter
- Sensitive to tuning and how linearization is performed
At this stage it helps to take a step back
- Kalman filtering is actually a special case of optimal filtering for probabilistic state space models
- We will revisit Kalman filtering within that broader context
- This will help understand EKF and other approaches (UKF, CKF, particle filtering) more easily
We're going to switch gears and do a bit more probabilities (and less linear algebra)

Probabilistic State Space Model

A probabilistic state space model consists of a state evolution and measurement model \[ x_{i+1} \sim p(x_{i+1}|x_{0:i}y_{0:i})\qquad y_i \sim p(y_i|x_{0:i}y_{0:i-1}) \] where \(x_i^\intercal = \mat{ccc}{x_{i,1}&\cdots&x_{i,n}}\) is the state and \(y_i= \mat{ccc}{y_{i,1}&\cdots&y_{i,m}}\) is the measurement.

We define \(x_{0:i}\eqdef \mat{ccc}{x_{0}&\cdots&x_{i}}\).
The dynamic model is Markovian if \(p(x_{i+1}|x_{0:i}y_{0:i}) = p(x_{i+1}|x_i)\). The measurement model satisfies conditional independence if \(p(y_i|x_{0:i}y_{0:i-1}) = p(y_i|x_i)\)
From now on, unless otherwise specified, we assume Markovianity and conditional independence hold
Illustration: functional dependence graphs, hidden Markov model

Functional dependence graphs

Consider \(m\) independent random variables and \(n\) functions of these variables. A functional dependence graph is a directed graph having \(m + n\) vertices, and in which edges are drawn from one vertex to another if the random variable of the former vertex is an argument in the function defining the latter.
Simple measurement model \(y = x + n\)

Let \(x,y\) be jointly distributed random variables with a well-defined PMF or PDF. There exists a random variable \(n\) independent of \(x\) and a function \(f\) such that \(x=f(x,n)\).

Functional dependence graphs

Let \(\calX\), \(\calY\), and \(\calZ\) be disjoint subsets of vertices in a functional dependence graph \(\mathcal{G}\). The subset \(\calZ\) is said to d-separate \(\calX\) from \(\calY\) if there exists no path between a vertex of X and a vertex of Y after the following operations have been performed:
1. construct the subgraph \(\mathcal{G}′\) consisting of all vertices in \(\calX\),\(\calY\),and \(\calZ\), as well as the edges and vertices encountered when moving backward starting from any of the vertices in \(\calX\), \(\calY\), or \(\calZ\);
2. in the subgraph \(\mathcal{G}'\), delete all edges coming out of \(\calZ\);
3. remove all arrows in \(\mathcal{G}'\) to obtain an undirected graph.
Let \(\calX\), \(\calY\), and \(\calZ\) be disjoint subsets of vertices in a functional dependence graph. If \(\calZ\) d-separates \(\calX\) from \(\calY\), and if we collect the random variables in \(\calX\), \(\calY\), and \(\calZ\) in the random vectors \(x\), \(y\), and \(z\), respectively, then \(x\rightarrow y\rightarrow z\) forms a Markov chain (\(x\) and \(z\) are conditionally independent given \(y\))
In the probabilistic state space model, past states \(x_{0:i-1}\) are independent of the future states \(x_{i+1:T}\) and measurements \(y_{i:T}\) given the present state \(x_i\).

Bayesian optimal filtering

Bayesian optimal filtering consists in computing the distribution \(p(x_i|y_{0:i})\) (compare to \(\hat{x}_{i|i}\)) given
1. Prior distribution \(p(x_0)\)
2. Probabilistic state space model
3. Measurement sequence \(y_{0:i}\)
1. Initialization: start from known \(p(x_0)\)
2. For \(i\geq 1\)
  1. Prediction: compute \(p(x_i|y_{0:i-1})\) by Chapman-Kolmogorov equation
  \[ p(x_i|y_{0:i-1})= \]
  1. Update: compute \(p(x_i|y_{0:i})\) by Bayes' rule
  \[ p(x_i|y_{0:i})= \frac{1}{Z_i}p(y_i|x_i)p(x_i|y_{0:i-1})\textsf{ with } Z_i= \int p(y_i|x_i)p(x_i|y_{0:i-1})dx_i \]

Special case: Gauss-Markov model

A Gauss-Markov model is a Gaussian driven linear model \[ x_{i+1} = F_i x_i + u_i\qquad y_i = H_i x_i + v_i \] where \(u_i\sim\calN(0,Q_i)\) and \(v_i\sim\calN(0,R_i)\) are Gaussian white processes (assumed independent).
We assume that all variables are real-valued for simplicity
Let \(x\) and \(y\) be jointly distributed random variables \[ \mat{c}{x\\y}\sim\calN\left(\mat{c}{\mu_x\\\mu_y},\mat{cc}{R_x&R_{xy}\\R_{yx}&R_y}\right) \] Then \(x\sim\calN(\mu_x,R_x)\), \(y\sim\calN(\mu_y,R_y)\) and \[ \begin{aligned} x|y&\sim\calN(\mu_x+R_{xy}R_y^{-1}(y-\mu_y),R_x-R_{xy}R_y^{-1}R_{yx})\\ y|x&\sim\calN(\mu_y+R_{yx}R_x^{-1}(x-\mu_x),R_y-R_{yx}R_x^{-1}R_{xy}) \end{aligned} \]

Kalman filter revisited

All distribution are jointly Gaussian in a Gauss-Markov model.
The Kalman filter is the Bayesian optimal filter of the Gauss Markov model
1. Initialization: \(x_0\sim\calN(\hat{x}_{0|-1},P_0)\)
2. Prediction: \(x_{i|i-1}\sim\calN(\hat{x}_{i|i-1},P_{i|i-1})\) with
\[ \hat{x}_{i|i-1} = F_{i-1} \hat{x}_{i-1|i-1}\qquad P_{i|i-1} = F_{i-1} P_{i-1|i-1}F_{i-1}^\intercal + Q_{i-1} \]
1. Update: \(x_{i|i}\sim\calN(\hat{x}_{i|i},P_{i|i})\) with
\[ \begin{aligned} \hat{x}_{i|i} &= \hat{x}_{i|i-1}+K_{f,i}(y_i-H_i\hat{x}_{i|i-1})\\ K_{f,i} &= P_{i|i-1}H_i^\intercal(H_iP_{i|i-1}H_i^\intercal+Q_i)^{-1}\\ P_{i|i}&= P_{i|i-1}-K_{f,i}H_i P_{i|i-1} \end{aligned} \]
Remark: our initial analysis did not assume Gaussian distribution