Dimensionality reduction

Matthieu Bloch

Feature vectors in dataset \(\calD\) are \(\bfx_i\in\bbR^d\), with \(d\) potentially large
Dimensionality reduction aims at transforming inputs to new variables in \(\bfx'_i\in\bbR^k\) with \(k\ll d\)
- Needs to minimize information loss to maintain performance
- Often improves computational efficiency
- Helps prevent overfitting, especially when \(N\ll d\)
Curse of dimensionality
- Volume increases significantly with dimension
- Often translates into requiring exponentially more data for the results to be reliable
- Intuition can be misleading, neighborhoods are not necessarily local anymore
Median distance to origin of dataset sampled i.i.d. uniformly in unit ball of \(\bbR^p\)

Types of Dimensionality Reduction Techniques

Information loss: how is information loss measured?
Supervised vs unsupervised: are the labels \(y_i\) used?
Linear vs nonlinear: is the map \(\bfx \to \bfx'\) linear or non-linear?
Selection vs extraction: are features selected or extracted? \[\bfx'_{\textsf{select}}=\mat{c}{x_1\\x_6\\x_{32}}\quad\textsf{vs}\quad\bfx'_{\textsf{extract}}=\mat{c}{\phi_1(\bfx)\\\phi_2(\bfx)\\\phi_3(\bfx)}\]
We will discuss several examples
- Filter methods: supervised selection method
- Fisher discriminant analysis: supervised extraction method
- Principal component analysis: unsupervised extraction method