Matthieu Bloch
Tuesday, November 1, 2022
A probabilistic state space model consists of a state evolution and measurement model \[ x_{i+1} \sim p(x_{i+1}|x_{0:i}y_{0:i})\qquad y_i \sim p(y_i|x_{0:i}y_{0:i-1}) \] where \(x_i^\intercal = \mat{ccc}{x_{i,1}&\cdots&x_{i,n}}\) is the state and \(y_i= \mat{ccc}{y_{i,1}&\cdots&y_{i,m}}\) is the measurement.
We define \(x_{0:i}\eqdef \mat{ccc}{x_{0}&\cdots&x_{i}}\).
The dynamic model is Markovian if \(p(x_{i+1}|x_{0:i}y_{0:i}) = p(x_{i+1}|x_i)\). The measurement model satisfies conditional independence if \(p(y_i|x_{0:i}y_{0:i-1}) = p(y_i|x_i)\)
Consider \(m\) independent random variables and \(n\) functions of these variables. A functional dependence graph is a directed graph having \(m + n\) vertices, and in which edges are drawn from one vertex to another if the random variable of the former vertex is an argument in the function defining the latter.
Let \(x,y\) be jointly distributed random variables with a well-defined PMF or PDF. There exists a random variable \(n\) independent of \(x\) and a function \(f\) such that \(x=f(x,n)\).
Let \(\calX\), \(\calY\), and \(\calZ\) be disjoint subsets of vertices in a functional dependence graph. If \(\calZ\) d-separates \(\calX\) from \(\calY\), and if we collect the random variables in \(\calX\), \(\calY\), and \(\calZ\) in the random vectors \(x\), \(y\), and \(z\), respectively, then \(x\rightarrow y\rightarrow z\) forms a Markov chain (\(x\) and \(z\) are conditionally independent given \(y\))
In the probabilistic state space model, past states \(x_{0:i-1}\) are independent of the future states \(x_{i+1:T}\) and measurements \(y_{i:T}\) given the present state \(x_i\).
A Gauss-Markov model is a Gaussian driven linear model \[ \vecx_{i+1} = \matF_i \vecx_i + \vecu_i\qquad y_i = \matH_i \vecx_i +\vecv_i \] where \(\vecu_i\sim\calN(0,\matQ_i)\) and \(\vecv_i\sim\calN(0,\matR_i)\) are Gaussian white processes (assumed independent).
We assume that all variables are real-valued for simplicity
Let \(x\) and \(y\) be jointly distributed random variables \[ \mat{c}{x\\y}\sim\calN\left(\mat{c}{\mu_x\\\mu_y},\mat{cc}{\matR_x&\matR_{xy}\\\matR_{yx}&\matR_y}\right) \] Then \(x\sim\calN(\mu_x,\matR_x)\), \(y\sim\calN(\mu_y,\matR_y)\) and \[ \begin{aligned} x|y&\sim\calN(\mu_x+\matR_{xy}\matR_y^{-1}(y-\mu_y),\matR_x-\matR_{xy}\matR_y^{-1}\matR_{yx})\\ y|x&\sim\calN(\mu_y+\matR_{yx}\matR_x^{-1}(x-\mu_x),\matR_y-\matR_{yx}\matR_x^{-1}\matR_{xy}) \end{aligned} \]
Let \(x\sim\calN(\mu_x,\Sigma_x)\) and let \(y|x\sim\calN(Hx,R)\) (e.g., \(y = Hx + v\)). Then \[ \mat{c}{x\\y}\sim\calN\left(\mat{c}{\mu_x\\H\mu_x},\mat{cc}{\Sigma_x&\Sigma_x H^\intercal\\H \Sigma_x&H\Sigma_x H^\intercal + R}\right) \]
Let \(x\sim\calN(\mu_x,\Sigma_x)\) and let \(y=Ax\). Then \(y\sim\calN(A\mu_x,A\Sigma_x A^\intercal)\).
All distribution are jointly Gaussian in a Gauss-Markov model.
\[ \begin{aligned} \hat{\vecx}_{i|i} &= \hat{\vecx}_{i|i-1}+\matK_{f,i}(\vecy_i-\matH_i\hat{\vecx}_{i|i-1})\\ \matK_{f,i} &= \matP_{i|i-1}\matH_i^\intercal(\matH_i\matP_{i|i-1}\matH_i^\intercal+\matQ_i)^{-1}\\ \matP_{i|i}&= \matP_{i|i-1}-\matK_{f,i}\matH_i \matP_{i|i-1} \end{aligned} \]
Consider model of the form \[ x_i = f(x_{i-1}) + u_{i-1}\quad y_i = h(x_i) + v_i \] where \(u_i\sim\calN(0,Q_i)\) and \(v_i\sim\calN(0,R_i)\) are Gaussian white processes (assumed independent).
This is a special case of general probabilistic state space model.
No closed form solution for non-linear \(f\) and \(h\)
Solution: linearize non linear functions \[ f(x) \approx f(x_0) + F_x(x_0)(x-x_0)\qquad h(x) \approx g(x_0) + H_x(x_0)(x-x_0) \] where \(F_x\) and \(H_x\) are the Jacobian matrices of \(f\), \(h\), respectively
Can be understood from the perspective of approximate transformation of variables
Consider \(x\sim\calN(\mu,P)\) and \(y=g(x)\). The probability density of \(y\) is not Gaussian but can be approximated as \[ y\approx \calN\left(g(\mu),G_x(\mu)PG_x(\mu)^\intercal\right) \] where \(G_x\) is the Jacobian matrix of \(g\)
Consider \(x\sim\calN(\mu,P)\), \(q\sim\calN(0,Q)\) and \(y=g(x)+q\). The joint probability density of \(x,y\) is not Gaussian but can be approximated as \[ \mat{c}{x\\y}\approx \calN\left(\mat{c}{\mu\\g(\mu)},\mat{cc}{P & P G_x(\mu)^\intercal\\G_x(\mu)P&G_x(\mu)PG_x(\mu)^\intercal + Q}\right) \]
\[ \begin{aligned} \hat{x}_{i|i} &= \hat{x}_{i|i-1}+K_{f,i}(y_i-h(\hat{x}_{i|i-1}))\\ K_{f,i} &= P_{i|i-1}H_x(\hat{x}_{i-1|i-1})^\intercal(H_x(\hat{x}_{i-1|i-1})P_{i|i-1}H_x(\hat{x}_{i-1|i-1})^\intercal+R_i)^{-1}\\ P_{i|i}&= P_{i|i-1}-K_{f,i}H_x(\hat{x}_{i-1|i-1}) P_{i|i-1} \end{aligned} \]