Probabilistic State-Space Models

Matthieu Bloch

Tuesday, November 1, 2022

Today in ECE 6555

  • Announcements
    • Office hours today in TSRB 423 (Back to normal)
    • 10 lectures left (including today)
    • Bonus assignment: due by November 7, 2022, only graded if it matters (look at it before the drop date!)
    • Kalman filtering mini project due November 10, 2022
      • Can address questions 1-8
      • Need more concepts to do the rest, covered this week!
  • Last time
    • Probabilistic state-space model
  • Today
    • Bayesian filtering
  • Questions?

Last time: Probabilistic State Space Model

  • A probabilistic state space model consists of a state evolution and measurement model \[ x_{i+1} \sim p(x_{i+1}|x_{0:i}y_{0:i})\qquad y_i \sim p(y_i|x_{0:i}y_{0:i-1}) \] where \(x_i^\intercal = \mat{ccc}{x_{i,1}&\cdots&x_{i,n}}\) is the state and \(y_i= \mat{ccc}{y_{i,1}&\cdots&y_{i,m}}\) is the measurement.

    We define \(x_{0:i}\eqdef \mat{ccc}{x_{0}&\cdots&x_{i}}\).

  • The dynamic model is Markovian if \(p(x_{i+1}|x_{0:i}y_{0:i}) = p(x_{i+1}|x_i)\). The measurement model satisfies conditional independence if \(p(y_i|x_{0:i}y_{0:i-1}) = p(y_i|x_i)\)

  • From now on, unless otherwise specified, we assume Markovianity and conditional independence hold
  • Illustration: functional dependence graphs, hidden Markov model

Last time: Functional dependence graphs

  • Consider \(m\) independent random variables and \(n\) functions of these variables. A functional dependence graph is a directed graph having \(m + n\) vertices, and in which edges are drawn from one vertex to another if the random variable of the former vertex is an argument in the function defining the latter.

  • Simple measurement model \(y = x + n\)
  • Let \(x,y\) be jointly distributed random variables with a well-defined PMF or PDF. There exists a random variable \(n\) independent of \(x\) and a function \(f\) such that \(x=f(x,n)\).

Last time: Functional dependence graphs

  • Let \(\calX\), \(\calY\), and \(\calZ\) be disjoint subsets of vertices in a functional dependence graph \(\mathcal{G}\). The subset \(\calZ\) is said to d-separate \(\calX\) from \(\calY\) if there exists no path between a vertex of X and a vertex of Y after the following operations have been performed:
    1. construct the subgraph \(\mathcal{G}′\) consisting of all vertices in \(\calX\),\(\calY\),and \(\calZ\), as well as the edges and vertices encountered when moving backward starting from any of the vertices in \(\calX\), \(\calY\), or \(\calZ\);
    2. in the subgraph \(\mathcal{G}'\), delete all edges coming out of \(\calZ\);
    3. remove all arrows in \(\mathcal{G}'\) to obtain an undirected graph.
  • Let \(\calX\), \(\calY\), and \(\calZ\) be disjoint subsets of vertices in a functional dependence graph. If \(\calZ\) d-separates \(\calX\) from \(\calY\), and if we collect the random variables in \(\calX\), \(\calY\), and \(\calZ\) in the random vectors \(x\), \(y\), and \(z\), respectively, then \(x\rightarrow y\rightarrow z\) forms a Markov chain (\(x\) and \(z\) are conditionally independent given \(y\))

  • In the probabilistic state space model, past states \(x_{0:i-1}\) are independent of the future states \(x_{i+1:T}\) and measurements \(y_{i:T}\) given the present state \(x_i\).

Bayesian optimal filtering

  • Bayesian optimal filtering consists in computing the distribution \(p(x_i|y_{0:i})\) (compare to \(\hat{x}_{i|i}\)) given
    1. Prior distribution \(p(x_0)\)
    2. Probabilistic state space model
    3. Measurement sequence \(y_{0:i}\)
    1. Initialization: start from known \(p(x_0)\)
    2. For \(i\geq 1\)
      1. Prediction: compute \(p(x_i|y_{0:i-1})\) by Chapman-Kolmogorov equation \[ p(x_i|y_{0:i-1})= \int p(x_i|x_{i-1})p(x_{i-1}|y_{0:i-1})dx_{i-1} \]
      2. Update: compute \(p(x_i|y_{0:i})\) by Bayes' rule \[ p(x_i|y_{0:i})= \frac{1}{Z_i}p(y_i|x_i)p(x_i|y_{0:i-1})\textsf{ with } Z_i= \int p(y_i|x_i)p(x_i|y_{0:i-1})dx_i \]

Special case: Gauss-Markov model

  • A Gauss-Markov model is a Gaussian driven linear model \[ \vecx_{i+1} = \matF_i \vecx_i + \vecu_i\qquad y_i = \matH_i \vecx_i +\vecv_i \] where \(\vecu_i\sim\calN(0,\matQ_i)\) and \(\vecv_i\sim\calN(0,\matR_i)\) are Gaussian white processes (assumed independent).

  • We assume that all variables are real-valued for simplicity

  • Let \(x\) and \(y\) be jointly distributed random variables \[ \mat{c}{x\\y}\sim\calN\left(\mat{c}{\mu_x\\\mu_y},\mat{cc}{\matR_x&\matR_{xy}\\\matR_{yx}&\matR_y}\right) \] Then \(x\sim\calN(\mu_x,\matR_x)\), \(y\sim\calN(\mu_y,\matR_y)\) and \[ \begin{aligned} x|y&\sim\calN(\mu_x+\matR_{xy}\matR_y^{-1}(y-\mu_y),\matR_x-\matR_{xy}\matR_y^{-1}\matR_{yx})\\ y|x&\sim\calN(\mu_y+\matR_{yx}\matR_x^{-1}(x-\mu_x),\matR_y-\matR_{yx}\matR_x^{-1}\matR_{xy}) \end{aligned} \]

Properties of Gaussian distributions

  • Let \(x\sim\calN(\mu_x,\Sigma_x)\) and let \(y|x\sim\calN(Hx,R)\) (e.g., \(y = Hx + v\)). Then \[ \mat{c}{x\\y}\sim\calN\left(\mat{c}{\mu_x\\H\mu_x},\mat{cc}{\Sigma_x&\Sigma_x H^\intercal\\H \Sigma_x&H\Sigma_x H^\intercal + R}\right) \]

  • Let \(x\sim\calN(\mu_x,\Sigma_x)\) and let \(y=Ax\). Then \(y\sim\calN(A\mu_x,A\Sigma_x A^\intercal)\).

  • All distribution are jointly Gaussian in a Gauss-Markov model.

Kalman filter revisited

  • The Kalman filter is the Bayesian optimal filter of the Gauss Markov model
    1. Initialization: \(\vecx_0\sim\calN(\hat{\vecx}_{0|-1},\matP_0)\)
    2. Prediction: \(\vecx_{i|i-1}\sim\calN(\hat{\vecx}_{i|i-1},\matP_{i|i-1})\) with
    \[ \hat{\vecx}_{i|i-1} = \matF_{i-1} \hat{\vecx}_{i-1|i-1}\qquad \matP_{i|i-1} = \matF_{i-1} \matP_{i-1|i-1}\matF_{i-1}^\intercal + \matQ_{i-1} \]
    1. Update: \(\vecx_{i|i}\sim\calN(\hat{\vecx}_{i|i},\matP_{i|i})\) with

    \[ \begin{aligned} \hat{\vecx}_{i|i} &= \hat{\vecx}_{i|i-1}+\matK_{f,i}(\vecy_i-\matH_i\hat{\vecx}_{i|i-1})\\ \matK_{f,i} &= \matP_{i|i-1}\matH_i^\intercal(\matH_i\matP_{i|i-1}\matH_i^\intercal+\matQ_i)^{-1}\\ \matP_{i|i}&= \matP_{i|i-1}-\matK_{f,i}\matH_i \matP_{i|i-1} \end{aligned} \]

  • Remark: our earlier analysis did not assume Gaussian distribution

Extended Kalman filtering

  • Consider model of the form \[ x_i = f(x_{i-1}) + u_{i-1}\quad y_i = h(x_i) + v_i \] where \(u_i\sim\calN(0,Q_i)\) and \(v_i\sim\calN(0,R_i)\) are Gaussian white processes (assumed independent).

  • This is a special case of general probabilistic state space model.

  • No closed form solution for non-linear \(f\) and \(h\)

  • Solution: linearize non linear functions \[ f(x) \approx f(x_0) + F_x(x_0)(x-x_0)\qquad h(x) \approx g(x_0) + H_x(x_0)(x-x_0) \] where \(F_x\) and \(H_x\) are the Jacobian matrices of \(f\), \(h\), respectively

  • Can be understood from the perspective of approximate transformation of variables

Linear Approximations of Non-Linear Transforms

  • Consider \(x\sim\calN(\mu,P)\) and \(y=g(x)\). The probability density of \(y\) is not Gaussian but can be approximated as \[ y\approx \calN\left(g(\mu),G_x(\mu)PG_x(\mu)^\intercal\right) \] where \(G_x\) is the Jacobian matrix of \(g\)

  • Consider \(x\sim\calN(\mu,P)\), \(q\sim\calN(0,Q)\) and \(y=g(x)+q\). The joint probability density of \(x,y\) is not Gaussian but can be approximated as \[ \mat{c}{x\\y}\approx \calN\left(\mat{c}{\mu\\g(\mu)},\mat{cc}{P & P G_x(\mu)^\intercal\\G_x(\mu)P&G_x(\mu)PG_x(\mu)^\intercal + Q}\right) \]

Extended Kalman filter revisited

  • The Kalman filter is an approximate Bayesian optimal filter
    1. Initialization: \(x_0\sim\calN(\hat{x}_{0|-1},P_0)\)
    2. Prediction: \(x_{i|i-1}\sim\calN(\hat{x}_{i|i-1},P_{i|i-1})\) with
    \[ \hat{x}_{i|i-1} = f(\hat{x}_{i-1|i-1})\qquad P_{i|i-1} = F_x(\hat{x}_{i-1|i-1}) P_{i-1|i-1}F_x(\hat{x}_{i-1|i-1})^\intercal + Q_{i-1} \]
    1. Update: \(x_{i|i}\sim\calN(\hat{x}_{i|i},P_{i|i})\) with

    \[ \begin{aligned} \hat{x}_{i|i} &= \hat{x}_{i|i-1}+K_{f,i}(y_i-h(\hat{x}_{i|i-1}))\\ K_{f,i} &= P_{i|i-1}H_x(\hat{x}_{i-1|i-1})^\intercal(H_x(\hat{x}_{i-1|i-1})P_{i|i-1}H_x(\hat{x}_{i-1|i-1})^\intercal+R_i)^{-1}\\ P_{i|i}&= P_{i|i-1}-K_{f,i}H_x(\hat{x}_{i-1|i-1}) P_{i|i-1} \end{aligned} \]