Probabilistic State-Space Models

Matthieu Bloch

Tuesday, November 1, 2022

Today in ECE 6555

Announcements
- Office hours today in TSRB 423 (Back to normal)
- 10 lectures left (including today)
- Bonus assignment: due by November 7, 2022, only graded if it matters (look at it before the drop date!)
- Kalman filtering mini project due November 10, 2022
  - Can address questions 1-8
  - Need more concepts to do the rest, covered this week!
Last time
- Probabilistic state-space model
Today
- Bayesian filtering
Questions?

A probabilistic state space model consists of a state evolution and measurement model \[ x_{i+1} \sim p(x_{i+1}|x_{0:i}y_{0:i})\qquad y_i \sim p(y_i|x_{0:i}y_{0:i-1}) \] where \(x_i^\intercal = \mat{ccc}{x_{i,1}&\cdots&x_{i,n}}\) is the state and \(y_i= \mat{ccc}{y_{i,1}&\cdots&y_{i,m}}\) is the measurement.

We define \(x_{0:i}\eqdef \mat{ccc}{x_{0}&\cdots&x_{i}}\).

The dynamic model is Markovian if \(p(x_{i+1}|x_{0:i}y_{0:i}) = p(x_{i+1}|x_i)\). The measurement model satisfies conditional independence if \(p(y_i|x_{0:i}y_{0:i-1}) = p(y_i|x_i)\)
From now on, unless otherwise specified, we assume Markovianity and conditional independence hold

Consider \(m\) independent random variables and \(n\) functions of these variables. A functional dependence graph is a directed graph having \(m + n\) vertices, and in which edges are drawn from one vertex to another if the random variable of the former vertex is an argument in the function defining the latter.
Simple measurement model \(y = x + n\)

Let \(x,y\) be jointly distributed random variables with a well-defined PMF or PDF. There exists a random variable \(n\) independent of \(x\) and a function \(f\) such that \(x=f(x,n)\).

Let \(\calX\), \(\calY\), and \(\calZ\) be disjoint subsets of vertices in a functional dependence graph \(\mathcal{G}\). The subset \(\calZ\) is said to d-separate \(\calX\) from \(\calY\) if there exists no path between a vertex of X and a vertex of Y after the following operations have been performed:
1. construct the subgraph \(\mathcal{G}′\) consisting of all vertices in \(\calX\),\(\calY\),and \(\calZ\), as well as the edges and vertices encountered when moving backward starting from any of the vertices in \(\calX\), \(\calY\), or \(\calZ\);
2. in the subgraph \(\mathcal{G}'\), delete all edges coming out of \(\calZ\);
3. remove all arrows in \(\mathcal{G}'\) to obtain an undirected graph.

Let \(\calX\), \(\calY\), and \(\calZ\) be disjoint subsets of vertices in a functional dependence graph. If \(\calZ\) d-separates \(\calX\) from \(\calY\), and if we collect the random variables in \(\calX\), \(\calY\), and \(\calZ\) in the random vectors \(x\), \(y\), and \(z\), respectively, then \(x\rightarrow y\rightarrow z\) forms a Markov chain (\(x\) and \(z\) are conditionally independent given \(y\))

In the probabilistic state space model, past states \(x_{0:i-1}\) are independent of the future states \(x_{i+1:T}\) and measurements \(y_{i:T}\) given the present state \(x_i\).

Bayesian optimal filtering consists in computing the distribution \(p(x_i|y_{0:i})\) (compare to \(\hat{x}_{i|i}\)) given
1. Prior distribution \(p(x_0)\)
2. Probabilistic state space model
3. Measurement sequence \(y_{0:i}\)
1. Initialization: start from known \(p(x_0)\)
2. For \(i\geq 1\)
  1. Prediction: compute \(p(x_i|y_{0:i-1})\) by Chapman-Kolmogorov equation \[ p(x_i|y_{0:i-1})= \int p(x_i|x_{i-1})p(x_{i-1}|y_{0:i-1})dx_{i-1} \]
  2. Update: compute \(p(x_i|y_{0:i})\) by Bayes' rule \[ p(x_i|y_{0:i})= \frac{1}{Z_i}p(y_i|x_i)p(x_i|y_{0:i-1})\textsf{ with } Z_i= \int p(y_i|x_i)p(x_i|y_{0:i-1})dx_i \]

A Gauss-Markov model is a Gaussian driven linear model \[ \vecx_{i+1} = \matF_i \vecx_i + \vecu_i\qquad y_i = \matH_i \vecx_i +\vecv_i \] where \(\vecu_i\sim\calN(0,\matQ_i)\) and \(\vecv_i\sim\calN(0,\matR_i)\) are Gaussian white processes (assumed independent).
We assume that all variables are real-valued for simplicity
Let \(x\) and \(y\) be jointly distributed random variables \[ \mat{c}{x\\y}\sim\calN\left(\mat{c}{\mu_x\\\mu_y},\mat{cc}{\matR_x&\matR_{xy}\\\matR_{yx}&\matR_y}\right) \] Then \(x\sim\calN(\mu_x,\matR_x)\), \(y\sim\calN(\mu_y,\matR_y)\) and \[ \begin{aligned} x|y&\sim\calN(\mu_x+\matR_{xy}\matR_y^{-1}(y-\mu_y),\matR_x-\matR_{xy}\matR_y^{-1}\matR_{yx})\\ y|x&\sim\calN(\mu_y+\matR_{yx}\matR_x^{-1}(x-\mu_x),\matR_y-\matR_{yx}\matR_x^{-1}\matR_{xy}) \end{aligned} \]

Let \(x\sim\calN(\mu_x,\Sigma_x)\) and let \(y|x\sim\calN(Hx,R)\) (e.g., \(y = Hx + v\)). Then \[ \mat{c}{x\\y}\sim\calN\left(\mat{c}{\mu_x\\H\mu_x},\mat{cc}{\Sigma_x&\Sigma_x H^\intercal\\H \Sigma_x&H\Sigma_x H^\intercal + R}\right) \]
Let \(x\sim\calN(\mu_x,\Sigma_x)\) and let \(y=Ax\). Then \(y\sim\calN(A\mu_x,A\Sigma_x A^\intercal)\).
All distribution are jointly Gaussian in a Gauss-Markov model.

Consider model of the form \[ x_i = f(x_{i-1}) + u_{i-1}\quad y_i = h(x_i) + v_i \] where \(u_i\sim\calN(0,Q_i)\) and \(v_i\sim\calN(0,R_i)\) are Gaussian white processes (assumed independent).
This is a special case of general probabilistic state space model.
No closed form solution for non-linear \(f\) and \(h\)
Solution: linearize non linear functions \[ f(x) \approx f(x_0) + F_x(x_0)(x-x_0)\qquad h(x) \approx g(x_0) + H_x(x_0)(x-x_0) \] where \(F_x\) and \(H_x\) are the Jacobian matrices of \(f\), \(h\), respectively
Can be understood from the perspective of approximate transformation of variables

Consider \(x\sim\calN(\mu,P)\) and \(y=g(x)\). The probability density of \(y\) is not Gaussian but can be approximated as \[ y\approx \calN\left(g(\mu),G_x(\mu)PG_x(\mu)^\intercal\right) \] where \(G_x\) is the Jacobian matrix of \(g\)
Consider \(x\sim\calN(\mu,P)\), \(q\sim\calN(0,Q)\) and \(y=g(x)+q\). The joint probability density of \(x,y\) is not Gaussian but can be approximated as \[ \mat{c}{x\\y}\approx \calN\left(\mat{c}{\mu\\g(\mu)},\mat{cc}{P & P G_x(\mu)^\intercal\\G_x(\mu)P&G_x(\mu)PG_x(\mu)^\intercal + Q}\right) \]