Matthieu Bloch
Feature extraction methods based: unsupervised, linear, based on sum of square errors
Idea is to find approximation of data as \[\bfx_i \approx \bfmu + \bfA\bftheta_i\textsf{ with }\bfmu\in\bbR^d,\bfA\in\bbR^{d\times k},\bftheta_i\in\bbR^{k} \] and \(\bfA\) has orthonormal columns
Hard part is finding \(\bfA\)
Assume that \(\bfmu\) and \(\bfA\) are fixed. Then, \[\bftheta_i=\bfA^{\intercal}(\bfx_i-\bfmu)\]
Assume \(\bfA\) is fixed and \(\bftheta_i = \bfA^\intercal(\bfx_i-\bfmu)\). Then, \[\bfmu=\frac{1}{N}\sum_{i=1}^N\bfx_i\]
One possible choice of \(\bfA\) is \[\bfA=[\bfu_1,\cdots,\bfu_k]\] where \(\bfu_i\)’s are the eigenvectors corresponding to the \(k\) largest eigenvalues of \(\bfS\eqdef\sum_{i=1}^N\bfx_i\bfx_i^\intercal\)
One possible choice of \(\bfA\) is \[\bfA=[\bfu_1,\cdots,\bfu_k]\] where \(\bfu_i\)’s are the eigenvectors corresponding to the \(k\) largest eigenvalues of \(\bfS\eqdef\sum_{i=1}^N\bfx_i\bfx_i^\intercal\)
Customary to center and scale a data set so that it has zero mean and unit variance along each feature
Typically select \(k\) such that residual error is small