Extended Kalman filtering

Matthieu Bloch

Tuesday October 25, 2022

Today in ECE 6555

  • Announcements
    • Office hours today in TSRB 132 (instead of 423)
    • No class on Thursday October 27, 2022 (travel)
    • Drop date: October 29, 2022, by 4:00 pm
    • Bonus assignment: due by November 7, 2022, only graded if it matters (look at it before the drop date!)
    • Kalman filtering mini project due November 10, 2022
      • Can address questions 1-8
      • Need more concepts to do the rest, covered this week!
  • Last time
    • Linearized Kalman filtering
  • Today
    • Extended Kalman filtering
  • Questions?

Last Time: Non linear filtering

  • Many interesting practical system are not linear! Consider the non-linear discrete time model \[ \vecx_{i+1} = f_i(\vecx_i) + g_i(\vecx_i) \vecu_i\qquad \vecy_i = h_i(\vecx_i)+\vecv_i \] with \(f_i(\cdot)\), \(g_i(\cdot)\), \(h_i(\cdot)\), are non linear and time-varying.

  • \(x_0\) is assumed to be random with mean \(\bar{x}_0\) and \[ \begin{aligned} \dotp{\mat{c}{ \vecx_0-\bar{\vecx}_0\\ \vecu_i\\ \vecv_i}}{\mat{c}{ \vecx_0-\bar{\vecx}_0\\ \vecu_j\\ \vecv_j\\1}} = \mat{cccc}{\Pi_0&0&0&0\\0& \matQ_i\delta_{ij}&0&0\\0&0&\matR_i\delta_{ij}&0}. \end{aligned} \]

  • Question: can we leverage what we obtained for linear filtering?

Last Time: Linearized Kalman filter (LKF)

  • Idea: open loop linearization of state-space equations around a known nominal trajectory \(\vecx_i^\text{nom}\). \[ \vecx_0^{\text{nom}} = \vecx_0\qquad \vecx^{\text{nom}}_{i+1} = f_i(\vecx^{\text{nom}}_{i}) \]

    • Nominal trajectory is deterministic
    • We can write \(\vecx_i = \vecx^{\text{nom}}_{i}+\Delta \vecx_i\) where \(\Delta \vecx_i\) is random
  • Assume \(\set{f_i,g_i,h_i}_{i\geq 0}\) smooth enough to make first order Taylor expansion \[ f_i(\vecx_i) \approx f_i(\vecx^{\text{nom}}_{i}) + \matF_i \Delta \vecx_i\qquad h_i(\vecx_i)\approx h_i(\vecx_i^{\text{nom}}) + \matH_i \Delta \vecx_i \] with \(\matF_i\) and \(\matH_i\) defined as \[ \matF_i\eqdef \left.\frac{\partial f_i(x)}{\partial x}\right\vert_{x=\vecx^{\text{nom}}_{i}}\qquad \matH_i\eqdef \left.\frac{\partial h_i(x)}{\partial x}\right\vert_{x=\vecx^{\text{nom}}_{i}} \]

  • Make zeroth order approximation \(g_i(\vecx_i)\approx g_i(\vecx^{\text{nom}}_{i})\eqdef \matG_i\)

Last Time: Linearized Kalman filter (LKF)

  • \[ \Delta \vecx_{i+1} = \matF_i \Delta \vecx_i + \matG_i \vecu_i\qquad \vecy_i-h_i(x^{\text{nom}}_{i}) = \matH_i \Delta \vecx_i + \vecv_i \]

  • \[ \begin{aligned} \hat{\vecx}_{i+1|i} &= \matF_i (\hat{\vecx}_{i|i} - \vecx_i^{\text{nom}}) + f_i(\vecx_i^{\text{nom}})\\ \hat{\vecx}_{i|i} &= \hat{\vecx}_{i|i-1}+\matK_{f,i} \left(\vecy_i-h_i(\vecx_i^{\text{nom}})-\matH_i (\hat{\vecx}_{i|i-1}-\vecx_i^{\text{nom}})\right)\\ \matK_{f,i}&= \matP_{i|i-1}\matH_i^T (\matH_i\matP_{i|i-1}\matH_i^T+R_i)^{-1}\\ \matP_{i|i}&=(\matI-\matK_{f,i}\matH_i)\matP_{i|i-1}\\ \matP_{i+1|i}&=\matF_i\matP_{i|i}\matF_i^T + \matG_i Q_i \matG_i^T. \end{aligned} \]

  • This doesn't really work
    • For small \(i\), the nominal trajectory is close to the true solution
    • Quickly breaks down as noise is accumulated through \(\norm{g_i(\vecx_i)\vecu_i}\)

Extended Kalman filter (EKF)

  • Idea: relinearization at every step around the current estimate

  • \[ f_i(\vecx_i) = f_i(\hat{\vecx}_{i|i}) + \matF_i (\vecx_i-\hat{\vecx}_{i|i}),\quad h_i(\vecx_i) = h_i(\hat{\vecx}_{i|i-1}) + \matH_i (\vecx_i-\hat{\vecx}_{i|i-1}) \]

    \[ g_i(\vecx_i)=g_i(\hat{\vecx}_{i|i})\eqdef \matG_i \]

  • \[ \vecx_{i+1}=\matF_i \vecx_i + (f_i(\hat{\vecx}_{i|i})-\matF_i\hat{\vecx}_{i|i}) + \matG_i \vecu_i\qquad \vecy_i - (h_i(\hat{\vecx}_{i|i-1})-\matH_i\hat{\vecx}_{i|i-1})=\matH_i \vecx_i + \vecv_i \]

  • \[ \begin{aligned} \hat{\vecx}_{i+1|i} &= f_i(\hat{\vecx}_{i|i})\\ \hat{\vecx}_{i|i} &= \hat{\vecx}_{i|i-1}+\matK_{f,i} \left(\vecy_i-h_i(\hat{\vecx}_{i|i-1})\right)\\ \matK_{f,i}&= \matP_{i|i-1}\matH_i^T (\matH_i\matP_{i|i-1}\matH_i^T+R_i)^{-1}\\ \matP_{i|i}&=(\matI-\matK_{f,i}\matH_i)\matP_{i|i-1}\\ \matP_{i+1|i}&=\matF_i\matP_{i|i}\matF_i^T + \matG_i Q_i \matG_i^T. \end{aligned} \]

Beyond Linear model

  • We are pushing the limits of Kalman filtering!
    • EKF is based on linearization to reduce a nonlinear state space model to a linear one
    • Perhaps most widely used Kalman filter
    • Sensitive to tuning and how linearization is performed
  • At this stage it helps to take a step back
    • Kalman filtering is actually a special case of optimal filtering for probabilistic state space models
    • We will revisit Kalman filtering within that broader context
    • This will help understand EKF and other approaches (UKF, CKF, particle filtering) more easily
  • We're going to switch gears and do a bit more probabilities (and less linear algebra)

Probabilistic State Space Model

  • A probabilistic state space model consists of a state evolution and measurement model \[ x_{i+1} \sim p(x_{i+1}|x_{0:i}y_{0:i})\qquad y_i \sim p(y_i|x_{0:i}y_{0:i-1}) \] where \(x_i^\intercal = \mat{ccc}{x_{i,1}&\cdots&x_{i,n}}\) is the state and \(y_i= \mat{ccc}{y_{i,1}&\cdots&y_{i,m}}\) is the measurement.

    We define \(x_{0:i}\eqdef \mat{ccc}{x_{0}&\cdots&x_{i}}\).

  • The dynamic model is Markovian if \(p(x_{i+1}|x_{0:i}y_{0:i}) = p(x_{i+1}|x_i)\). The measurement model satisfies conditional independence if \(p(y_i|x_{0:i}y_{0:i-1}) = p(y_i|x_i)\)

  • From now on, unless otherwise specified, we assume Markovianity and conditional independence hold
  • Illustration: functional dependence graphs, hidden Markov model

Functional dependence graphs

  • Consider \(m\) independent random variables and \(n\) functions of these variables. A functional dependence graph is a directed graph having \(m + n\) vertices, and in which edges are drawn from one vertex to another if the random variable of the former vertex is an argument in the function defining the latter.

  • Simple measurement model \(y = x + n\)
  • Let \(x,y\) be jointly distributed random variables with a well-defined PMF or PDF. There exists a random variable \(n\) independent of \(x\) and a function \(f\) such that \(x=f(x,n)\).

Functional dependence graphs

  • Let \(\calX\), \(\calY\), and \(\calZ\) be disjoint subsets of vertices in a functional dependence graph \(\mathcal{G}\). The subset \(\calZ\) is said to d-separate \(\calX\) from \(\calY\) if there exists no path between a vertex of X and a vertex of Y after the following operations have been performed:
    1. construct the subgraph \(\mathcal{G}′\) consisting of all vertices in \(\calX\),\(\calY\),and \(\calZ\), as well as the edges and vertices encountered when moving backward starting from any of the vertices in \(\calX\), \(\calY\), or \(\calZ\);
    2. in the subgraph \(\mathcal{G}'\), delete all edges coming out of \(\calZ\);
    3. remove all arrows in \(\mathcal{G}'\) to obtain an undirected graph.
  • Let \(\calX\), \(\calY\), and \(\calZ\) be disjoint subsets of vertices in a functional dependence graph. If \(\calZ\) d-separates \(\calX\) from \(\calY\), and if we collect the random variables in \(\calX\), \(\calY\), and \(\calZ\) in the random vectors \(x\), \(y\), and \(z\), respectively, then \(x\rightarrow y\rightarrow z\) forms a Markov chain (\(x\) and \(z\) are conditionally independent given \(y\))

  • In the probabilistic state space model, past states \(x_{0:i-1}\) are independent of the future states \(x_{i+1:T}\) and measurements \(y_{i:T}\) given the present state \(x_i\).

Bayesian optimal filtering

  • Bayesian optimal filtering consists in computing the distribution \(p(x_i|y_{0:i})\) (compare to \(\hat{x}_{i|i}\)) given
    1. Prior distribution \(p(x_0)\)
    2. Probabilistic state space model
    3. Measurement sequence \(y_{0:i}\)
    1. Initialization: start from known \(p(x_0)\)
    2. For \(i\geq 1\)
      1. Prediction: compute \(p(x_i|y_{0:i-1})\) by Chapman-Kolmogorov equation \[ p(x_i|y_{0:i-1})= \int p(x_i|x_{i-1})p(x_{i-1}|y_{0:i-1})dx_{i-1} \]
      2. Update: compute \(p(x_i|y_{0:i})\) by Bayes' rule \[ p(x_i|y_{0:i})= \frac{1}{Z_i}p(y_i|x_i)p(x_i|y_{0:i-1})\textsf{ with } Z_i= \int p(y_i|x_i)p(x_i|y_{0:i-1})dx_i \]

Special case: Gauss-Markov model

  • A Gauss-Markov model is a Gaussian driven linear model \[ \vecx_{i+1} = \matF_i \vecx_i + \vecu_i\qquad y_i = \matH_i \vecx_i +\vecv_i \] where \(\vecu_i\sim\calN(0,\matQ_i)\) and \(\vecv_i\sim\calN(0,\matR_i)\) are Gaussian white processes (assumed independent).

  • We assume that all variables are real-valued for simplicity

  • Let \(x\) and \(y\) be jointly distributed random variables \[ \mat{c}{x\\y}\sim\calN\left(\mat{c}{\mu_x\\\mu_y},\mat{cc}{\matR_x&\matR_{xy}\\\matR_{yx}&\matR_y}\right) \] Then \(x\sim\calN(\mu_x,\matR_x)\), \(y\sim\calN(\mu_y,\matR_y)\) and \[ \begin{aligned} x|y&\sim\calN(\mu_x+\matR_{xy}\matR_y^{-1}(y-\mu_y),\matR_x-\matR_{xy}\matR_y^{-1}\matR_{yx})\\ y|x&\sim\calN(\mu_y+\matR_{yx}\matR_x^{-1}(x-\mu_x),\matR_y-\matR_{yx}\matR_x^{-1}\matR_{xy}) \end{aligned} \]

Kalman filter revisited

  • All distribution are jointly Gaussian in a Gauss-Markov model.

  • The Kalman filter is the Bayesian optimal filter of the Gauss Markov model
    1. Initialization: \(\vecx_0\sim\calN(\hat{\vecx}_{0|-1},\matP_0)\)
    2. Prediction: \(\vecx_{i|i-1}\sim\calN(\hat{\vecx}_{i|i-1},\matP_{i|i-1})\) with
    \[ \hat{\vecx}_{i|i-1} = \matF_{i-1} \hat{\vecx}_{i-1|i-1}\qquad \matP_{i|i-1} = \matF_{i-1} \matP_{i-1|i-1}\matF_{i-1}^\intercal + \matQ_{i-1} \]
    1. Update: \(\vecx_{i|i}\sim\calN(\hat{\vecx}_{i|i},\matP_{i|i})\) with

    \[ \begin{aligned} \hat{\vecx}_{i|i} &= \hat{\vecx}_{i|i-1}+\matK_{f,i}(\vecy_i-\matH_i\hat{\vecx}_{i|i-1})\\ \matK_{f,i} &= \matP_{i|i-1}\matH_i^\intercal(\matH_i\matP_{i|i-1}\matH_i^\intercal+\matQ_i)^{-1}\\ \matP_{i|i}&= \matP_{i|i-1}-\matK_{f,i}\matH_i \matP_{i|i-1} \end{aligned} \]

  • Remark: our earlier analysis did not assume Gaussian distribution