Particle Filtering

Matthieu Bloch

Thursday, November 10, 2022

Today in ECE 6555

Announcements
- 7 lectures left (including today)
- Kalman filtering mini project due today (late deadline November 13, 2022)
Last time
- Importance sampling
Today
- Particle Filtering
- Gaussian processes
Questions?

Gaussian approximation often work well…
- but cannot capture multi-modal distributions, discrete components, etc.
Particle filters are Monte-Carlo methods to approximate the solution of Bayesian Filtering

Monte Carlo approximation
- The central difficulty in Bayesian inference is to compute \[ \E{g(\vecx)|\vecy_{1:T}} \eqdef \int g(\vecx)p(\vecx|\vecy_{1:T})d\vecx \] where \(g:\bbR^n\to\bbR^m\)
- No closed-form expression in general
- Approach
  - Draw \(n\) samples \(\vecx_i\sim p(\vecx|\vecy_{1:T})\)
  - Estimate \[ \E{g(\vecx)|\vecy_{1:T}} \approx \frac{1}{n}\sum_{i=1}^n g(\vecx_i) \]
- Converge in \(\calO(n^{-1/2})\) regardless of dimension

It might still be challenging to sample according to \(p(\vecx|\vecy_{1:T})\) (complicated functional form)

Importance sampling
- Use an importance distribution \(\pi(\vecx|\vecy_{1:T})\) \[ \E{g(\vecx)|\vecy_{1:T}} \eqdef \int g(\vecx)p(\vecx|\vecy_{1:T})d\vecx = \int g(\vecx)\frac{p(\vecx|\vecy_{1:T})}{\pi(\vecx|\vecy_{1:T})}\pi(\vecx|\vecy_{1:T})d\vecx \] assuming \(\pi(\vecx|\vecy_{1:T}) \ll p(\vecx|\vecy_{1:T})\)
- Draw \(n\) samples \(\vecx^{(i)}\sim \pi(\vecx|\vecy_{1:T})\)
- Estimate \[ \E{g(\vecx)|\vecy_{1:T}} \approx \sum_{i=1}^n \tilde{w}^{(i)} g(\vecx^{(i)})\textsf{ where } \tilde{w}^{(i)} \eqdef \frac{1}{n}\frac{p(\vecx^{(i)}|\vecy_{1:T})}{\pi(\vecx^{(i)}|\vecy_{1:T})} \]
- Challenge: assumes that we can evaluate \(p(\vecx^{(i)}|\vecy_{1:T})\)

In general, \(p(y_{1:T}|\vecx^{(i)})\) easier to evaluate (measurement model)
Use Bayes' rule \[ p(\vecx^{(i)}|\vecy_{1:T}) = \frac{p(\vecy_{1:T}|\vecx^{(i)})p(\vecx^{(i)})}{\int p\vec(y_{1:T}|\vecx)p(\vecx)d\vecx} \] and note the denominator is still hard to compute!
Importance sampling to the rescue \[ \E{g(\vecx)|\vecy_{1:T}} \approx \sum_{i=1}^n \tilde{w}^{(i)} g(\vecx^{(i)})\textsf{ where } \tilde{w}^{(i)} \eqdef \frac{\frac{p(\vecy_{1:T}|\vecx^{(i)})p(\vecx^{(i)})}{\pi(\vecx^{(i)}|\vecy_{1:T})}}{\sum_{j=1}^n\frac{p(\vecy_{1:T}|\vecx_j)p(\vecx_j)}{\pi(\vecx_j|\vecy_{1:T})}} \]

Assume measurement model \(p(\vecy_{1:T}|\vecx)\) and prior \(p(\vecx)\)
1. Draw \(n\) samples \[ \vecx^{(i)}\sim \pi(\vecx|\vecy_{1:T})\quad i=1\dots n \]
2. Compute unnormalized weights \[ v^{(i)} = \frac{p(\vecy_{1:T}|\vecx^{(i)})p(\vecx^{(i)})}{\pi(\vecx^{(i)}|\vecy_{1:T})} \]
3. Compute normalized weights \[ w^{(i)} = \frac{v^{(i)}}{\sum_{j=1}^n v_j} \]
4. Approximate posteriors \[ p(\vecx|\vecy_{1:T})\approx \sum_{i=1}^n w^{(i)}\delta(\vecx-\vecx^{(i)})\qquad \E{g(\vecx)|\vecy_{1:T}}\approx \sum_{i=1}^n w^{(i)} g(\vecx^{(i)}) \]

Sequential importance sampling is a variation that can be used for probabilistic state-space models with \[ \vecx_k\sim p(\vecx_k|\vecx_{k-1})\qquad \vecy_k\sim p(\vecy_k|\vecx_k) \]
Idea
- Keep track of particles \(\set{(w^{(i)}_{k},\vecx^{(i)}_k):i=1\dots n}\) at every time \(k\) to compute \[ p(\vecx_k|\vecy_{1:k})\approx \sum_{i=1}^n w_k^{(i)}\delta(\vecx_k-\vecx_k^{(i)})\qquad \E{g(\vecx_k)|\vecy_{1:k}}\approx \sum_{i=1}^n w_k^{(i)} g(\vecx_k^{(i)}) \]
- Use properties of state-space model to only keep track of current states at time \(k\), not entire history

Draw \(n\) samples from the prior \[ \vecx_0^{(i)}\sim p(\vecx_0)\quad i=1\dots n \] and set \(w^{(i)}=\frac{1}{n}\) for \(i=1\dots n\)
For each \(k=1\dots T\)
1. Draw samples \(\vecx_k^{(i)}\) from importance distributions \[ \vecx_k^{(i)}\sim \pi(\vecx_k|\vecx^{(i)}_{0:k-1}\vecy_{1:k})\quad i=1\dots n \]
2. Compute new weights \[ w_k^{(i)}\propto w_{k-1}^{(i)} \frac{p(\vecy_k|\vecx_k^{(i)})p(\vecx_k^{(i)}|\vecx_{k-1}^{(i)})}{\pi(\vecx_k^{(i)}|\vecx_{0:k-1}^{(i)},\vecy_{1:k})} \] and normalize

Try to choose \(\pi(\vecx_k|\vecx_{0:k-1},\vecy_{1:k})=\pi(\vecx_k|\vecx_{k-1},\vecy_{1:k})\)

Draw \(n\) samples from the prior \(\vecx_0^{(i)}\sim p(\vecx_0)\quad i=1\dots n\) and set \(w^{(i)}=\frac{1}{n}\) for \(i=1\dots n\)
For each \(k=1\dots T\)
1. Draw samples \(\vecx_k^{(i)}\) from importance distributions \[ \vecx_k^{(i)}\sim \pi(\vecx_k|\vecx^{(i)}_{0:k-1}\vecy_{1:k})\quad i=1\dots n \]
2. Compute new weights \[ w_k^{(i)}\propto w_{k-1}^{(i)} \frac{p(\vecy_k|\vecx_k^{(i)})p(\vecx_k^{(i)}|\vecx_{k-1}^{(i)})}{\pi(\vecx_k^{(i)}|\vecx_{0:k-1}^{(i)},\vecy_{1:k})} \] and normalize
3. If effective number of particles get too low, resample and reset to uniform weights \[ n_\textsf{eff}\approx\frac{1}{\sum_{j=1}^n (w_k^{(i)})^2} \]

What if we don't know part of the system?
- Need to specify what we mean by "don't know": statistics, functional form?
Two solutions to extend our results
1. Show robustness to uncertainty, i.e., show that results still sort of hold with "unknown" distrubances
2. Integrate the ability to learn what we don't know
Which solution to adopt depends on the problem, e.g., can we afford to learn?

Gaussian proceses have emerged has a powerful tool to model unknown functions and learn them from samples
A Gaussian process is a colelction fo random variables, any finite number of which have a joint Gaussian distribution
A Gaussian process is completely specific by a mean function \(m(\vecx)\) and a covariance function \(k(\vecx,\vecx')\) of a real-valued process \(f(\vecx)\) such that \[ m(\vecx)\eqdef \E{f(\vecx)}\qquad k(\vecx,\vecx') = \E{(f(\vecx-m(\vecx)))(f(\vecx')-m(\vecx'))} \]
Possible to generalize to vector-valued functions (more on this later), often assume \(m(\vecx)=0\)

Example of kernel: squared exponential kernel \[ k(\vecx,\vecx') \eqdef \exp\left(-\frac{1}{2}\norm[2]{\vecx-\vecx'}^2\right) \]
- The kernel is a hyperparameter of the model
- The kernel controls the smoothness of the functions we model
Key benefit of GPs: incorporate knowledge of observations of the function
- Assume we know \(n\) observations \((\vecx_i,f_i)\)
- Assume we would like to approximate the function at \(n^*\) test points \((\vecx_j^*,f_j^*)\)
- The joint distribution is \[ \left[\begin{array}{c}\vecf\\\vecf^*\end{array}\right]\sim\calN\left(\boldsymbol{0},\left[\begin{array}{cc}\matK(X,X)&\matK(X,X^*)\\\matK(X^*,X)&\matK(X^*X^*)\end{array}\right]\right) \]
- How do we use this to estimate \(f^*\)?