More Kalman filtering

Matthieu Bloch

Thursday, October 20, 2022

Today in ECE 6555

  • Announcements
    • Office hours next Tuesday October 25, 2022 in TSRB 132 (instead of 432)
    • Midterm graded - HW grades coming up
    • Drop date: October
    • Bonus assignment: due by drop date, only graded if it matters
    • Kalman filtering mini project!
  • Last time
    • Time/Measurement update
    • Quite a few simplifications if we avoid corner cases of singlular matrices
  • Today's plan
    • Linearized Kalman filtering, Extended Kalman filtering
  • Questions?

Non linear filtering

  • Many interesting practical system are not linear! Consider the non-linear discrete time model \[ x_{i+1} = f_i(x_i) + g_i(x_i) u_i\qquad y_i = h_i(x_i)+v_i \] with \(f_i(\cdot)\), \(g_i(\cdot)\), \(h_i(\cdot)\), are non linear and time-varying.

  • \(x_0\) is assumed to be random with mean \(\bar{x}_0\) and \[ \begin{aligned} \dotp{\mat{c}{ x_0-\bar{x}_0\\ u_i\\ v_i}}{\mat{c}{ x_0-\bar{x}_0\\ u_j\\ v_j\\1}} = \mat{cccc}{\Pi_0&0&0&0\\0& Q_i\delta_{ij}&0&0\\0&0&R_i\delta_{ij}&0}. \end{aligned} \]

  • Question: can we leverage what we obtained for linear filtering?

Linearized Kalman filter (LKF)

  • Idea: open loop linearization of state-space equations around a known nominal trajectory \(x_i^\text{nom}\). \[ x_0^{\text{nom}} = x_0\qquad x^{\text{nom}}_{i+1} = f_i(x^{\text{nom}}_{i}) \]

    • Nominal trajectory is deterministic
    • We can write \(x_i = x^{\text{nom}}_{i}+\Delta x_i\) where \(\Delta x_i\) is random
  • Assume \(\set{f_i,g_i,h_i}_{i\geq 0}\) smooth enough to make first order Taylor expansion \[ f_i(x_i) \approx f_i(x^{\text{nom}}_{i}) + F_i \Delta x_i\qquad h_i(x_i)\approx h_i(x_i^{\text{nom}}) + H_i \Delta x_i \] with \(F_i\) and \(H_i\) defined as \[ F_i\eqdef \left.\frac{\partial f_i(x)}{\partial x}\right\vert_{x=x^{\text{nom}}_{i}}\qquad H_i\eqdef \left.\frac{\partial h_i(x)}{\partial x}\right\vert_{x=x^{\text{nom}}_{i}} \]

  • Make zeroth order approximation \(g_i(x_i)\approx g_i(x^{\text{nom}}_{i})\eqdef G_i\)

Linearized Kalman filter (LKF)

  • \[ \Delta x_{i+1} = F_i \Delta x_i + G_i u_i\qquad y_i-h_i(x^{\text{nom}}_{i}) = H_i \Delta x_i + v_i \]

  • \[ \begin{aligned} \hat{x}_{i+1|i} &= F_i (\hat{x}_{i|i} - x_i^{\text{nom}}) + f_i(x_i^{\text{nom}})\\ \hat{x}_{i|i} &= \hat{x}_{i|i-1}+K_{f,i} \left(y_i-h_i(x_i^{\text{nom}})-H_i (\hat{x}_{i|i-1}-x_i^{\text{nom}})\right)\\ K_{f,i}&= P_{i|i-1}H_i^T (H_iP_{i|i-1}H_i^T+R_i)^{-1}\\ P_{i|i}&=(I-K_{f,i}H_i)P_{i|i-1}\\ P_{i+1|i}&=F_iP_{i|i}F_i^T + G_i Q_i G_i^T. \end{aligned} \]

  • This doesn't really work
    • For small \(i\), the nominal trajectory is close to the true solution
    • Quickly breaks down as noise is accumulated through \(\norm{g_i(x_i)u_i}\)

Extended Kalman filter (EKF)

  • Idea: relinearization at every step around the current estimate

  • \[ f_i(x_i) = f_i(\hat{x}_{i|i}) + F_i (x_i-\hat{x}_{i|i}),\quad h_i(x_i) = h_i(\hat{x}_{i|i-1}) + H_i (x_i-\hat{x}_{i|i-1}),\quad g_i(x_i)=g_i(\hat{x}_{i|i})\eqdef G_i \]

  • \[ x_{i+1}=F_i x_i + (f_i(\hat{x}_{i|i})-F_i\hat{x}_{i|i}) + G_i u_i\qquad y_i - (h_i(\hat{x}_{i|i-1})-H_i\hat{x}_{i|i-1})=H_i x_i + v_i \]

  • \[ \begin{aligned} \hat{x}_{i+1|i} &= f_i(\hat{x}_{i|i})\\ \hat{x}_{i|i} &= \hat{x}_{i|i-1}+K_{f,i} \left(y_i-h_i(\hat{x}_{i|i-1})\right)\\ K_{f,i}&= P_{i|i-1}H_i^T (H_iP_{i|i-1}H_i^T+R_i)^{-1}\\ P_{i|i}&=(I-K_{f,i}H_i)P_{i|i-1}\\ P_{i+1|i}&=F_iP_{i|i}F_i^T + G_i Q_i G_i^T. \end{aligned} \]

Beyond Linear model

  • We are pushing the limits of Kalman filtering!
    • EKF is based on linearization to reduce a nonlinear state space model to a linear one
    • Perhaps most widely used Kalman filter
    • Sensitive to tuning and how linearization is performed
  • At this stage it helps to take a step back
    • Kalman filtering is actually a special case of optimal filtering for probabilistic state space models
    • We will revisit Kalman filtering within that broader context
    • This will help understand EKF and other approaches (UKF, CKF, particle filtering) more easily
  • We're going to switch gears and do a bit more probabilities (and less linear algebra)

Probabilistic State Space Model

  • A probabilistic state space model consists of a state evolution and measurement model \[ x_{i+1} \sim p(x_{i+1}|x_{0:i}y_{0:i})\qquad y_i \sim p(y_i|x_{0:i}y_{0:i-1}) \] where \(x_i^\intercal = \mat{ccc}{x_{i,1}&\cdots&x_{i,n}}\) is the state and \(y_i= \mat{ccc}{y_{i,1}&\cdots&y_{i,m}}\) is the measurement.

    We define \(x_{0:i}\eqdef \mat{ccc}{x_{0}&\cdots&x_{i}}\).

  • The dynamic model is Markovian if \(p(x_{i+1}|x_{0:i}y_{0:i}) = p(x_{i+1}|x_i)\). The measurement model satisfies conditional independence if \(p(y_i|x_{0:i}y_{0:i-1}) = p(y_i|x_i)\)

  • From now on, unless otherwise specified, we assume Markovianity and conditional independence hold

  • Illustration: functional dependence graphs, hidden Markov model

Functional dependence graphs

  • Consider \(m\) independent random variables and \(n\) functions of these variables. A functional dependence graph is a directed graph having \(m + n\) vertices, and in which edges are drawn from one vertex to another if the random variable of the former vertex is an argument in the function defining the latter.

  • Simple measurement model \(y = x + n\)
  • Let \(x,y\) be jointly distributed random variables with a well-defined PMF or PDF. There exists a random variable \(n\) independent of \(x\) and a function \(f\) such that \(x=f(x,n)\).

Functional dependence graphs

  • Let \(\calX\), \(\calY\), and \(\calZ\) be disjoint subsets of vertices in a functional dependence graph \(\mathcal{G}\). The subset \(\calZ\) is said to d-separate \(\calX\) from \(\calY\) if there exists no path between a vertex of X and a vertex of Y after the following operations have been performed:
    1. construct the subgraph \(\mathcal{G}′\) consisting of all vertices in \(\calX\),\(\calY\),and \(\calZ\), as well as the edges and vertices encountered when moving backward starting from any of the vertices in \(\calX\), \(\calY\), or \(\calZ\);
    2. in the subgraph \(\mathcal{G}'\), delete all edges coming out of \(\calZ\);
    3. remove all arrows in \(\mathcal{G}'\) to obtain an undirected graph.
  • Let \(\calX\), \(\calY\), and \(\calZ\) be disjoint subsets of vertices in a functional dependence graph. If \(\calZ\) d-separates \(\calX\) from \(\calY\), and if we collect the random variables in \(\calX\), \(\calY\), and \(\calZ\) in the random vectors \(x\), \(y\), and \(z\), respectively, then \(x\rightarrow y\rightarrow z\) forms a Markov chain (\(x\) and \(z\) are conditionally independent given \(y\))

  • In the probabilistic state space model, past states \(x_{0:i-1}\) are independent of the future states \(x_{i+1:T}\) and measurements \(y_{i:T}\) given the present state \(x_i\).

Bayesian optimal filtering

  • Bayesian optimal filtering consists in computing the distribution \(p(x_i|y_{0:i})\) (compare to \(\hat{x}_{i|i}\)) given
    1. Prior distribution \(p(x_0)\)
    2. Probabilistic state space model
    3. Measurement sequence \(y_{0:i}\)
    1. Initialization: start from known \(p(x_0)\)
    2. For \(i\geq 1\)
      1. Prediction: compute \(p(x_i|y_{0:i-1})\) by Chapman-Kolmogorov equation
      \[ p(x_i|y_{0:i-1})= \]
      1. Update: compute \(p(x_i|y_{0:i})\) by Bayes' rule
      \[ p(x_i|y_{0:i})= \frac{1}{Z_i}p(y_i|x_i)p(x_i|y_{0:i-1})\textsf{ with } Z_i= \int p(y_i|x_i)p(x_i|y_{0:i-1})dx_i \]

Special case: Gauss-Markov model

  • A Gauss-Markov model is a Gaussian driven linear model \[ x_{i+1} = F_i x_i + u_i\qquad y_i = H_i x_i + v_i \] where \(u_i\sim\calN(0,Q_i)\) and \(v_i\sim\calN(0,R_i)\) are Gaussian white processes (assumed independent).

  • We assume that all variables are real-valued for simplicity

  • Let \(x\) and \(y\) be jointly distributed random variables \[ \mat{c}{x\\y}\sim\calN\left(\mat{c}{\mu_x\\\mu_y},\mat{cc}{R_x&R_{xy}\\R_{yx}&R_y}\right) \] Then \(x\sim\calN(\mu_x,R_x)\), \(y\sim\calN(\mu_y,R_y)\) and \[ \begin{aligned} x|y&\sim\calN(\mu_x+R_{xy}R_y^{-1}(y-\mu_y),R_x-R_{xy}R_y^{-1}R_{yx})\\ y|x&\sim\calN(\mu_y+R_{yx}R_x^{-1}(x-\mu_x),R_y-R_{yx}R_x^{-1}R_{xy}) \end{aligned} \]

Kalman filter revisited

  • All distribution are jointly Gaussian in a Gauss-Markov model.

  • The Kalman filter is the Bayesian optimal filter of the Gauss Markov model
    1. Initialization: \(x_0\sim\calN(\hat{x}_{0|-1},P_0)\)
    2. Prediction: \(x_{i|i-1}\sim\calN(\hat{x}_{i|i-1},P_{i|i-1})\) with
    \[ \hat{x}_{i|i-1} = F_{i-1} \hat{x}_{i-1|i-1}\qquad P_{i|i-1} = F_{i-1} P_{i-1|i-1}F_{i-1}^\intercal + Q_{i-1} \]
    1. Update: \(x_{i|i}\sim\calN(\hat{x}_{i|i},P_{i|i})\) with

    \[ \begin{aligned} \hat{x}_{i|i} &= \hat{x}_{i|i-1}+K_{f,i}(y_i-H_i\hat{x}_{i|i-1})\\ K_{f,i} &= P_{i|i-1}H_i^\intercal(H_iP_{i|i-1}H_i^\intercal+Q_i)^{-1}\\ P_{i|i}&= P_{i|i-1}-K_{f,i}H_i P_{i|i-1} \end{aligned} \]

  • Remark: our initial analysis did not assume Gaussian distribution