Matthieu Bloch
Clustering problem: given samples \(\{\bfx_i\}_{i=1}^N\) assign points to disjoint subsets called clusters, so that points in the same cluster are more similar to each other than points in different clusters
Clustering is map \(C:[1:N]\to[1;K]\) with \(K\) the number of clusters; how do we choose \(C\)
\[W(C) = \frac{1}{2}\sum_{k=1}^K \sum_{i:C(i)=k}\sum_{j:C(j)=k}\norm[2]{\bfx_i-\bfx_{j}}^2\]
\(K\)-means clustering: find \(C\) minimizing \(W(C)\)
The number of possible clusters is given by the Stirling’s numbers of the second kind \[S(N,K)=\frac{1}{K!}\sum_{k=1}^K(-1)^{K-k}{K \choose k}k^N\]
We want to find \(C^*=\min_C W(C)\)
\[C^* = \argmin_{C}\sum_{k=1}^KN_k\sum_{C(i)=k}\norm[2]{\bfx_i-\bfmu_k}^2\] where \(\bfmu_k=\frac{1}{N_k}\sum_{i:C(i)=k}\bfx_i\) and \(N_k=\card{\{i:C(i)=k\}}\)
Solve enlarged optimization problem \[\argmin_{C,\{\bfm_k\}}\sum_{k=1}^KN_k\sum_{C(i)=k}\norm[2]{\bfx_i-\bfm_k}^2\]
MLE of single multivariate Gaussian
MLE of mixture of multivariate Gaussian with incomplete data
Efficient algorithm to address incomplete data
Key idea is to work with MLE for complete data and average out unobserved hidden state