Dimensionality reduction

Matthieu Bloch

Dimensionality reduction

  • Feature vectors in dataset \(\calD\) are \(\bfx_i\in\bbR^d\), with \(d\) potentially large

  • Dimensionality reduction aims at transforming inputs to new variables in \(\bfx'_i\in\bbR^k\) with \(k\ll d\)
    • Needs to minimize information loss to maintain performance
    • Often improves computational efficiency
    • Helps prevent overfitting, especially when \(N\ll d\)
  • Curse of dimensionality
    • Volume increases significantly with dimension
    • Often translates into requiring exponentially more data for the results to be reliable
    • Intuition can be misleading, neighborhoods are not necessarily local anymore
  • Median distance to origin of dataset sampled i.i.d. uniformly in unit ball of \(\bbR^p\)

Types of Dimensionality Reduction Techniques

  • Information loss: how is information loss measured?

  • Supervised vs unsupervised: are the labels \(y_i\) used?

  • Linear vs nonlinear: is the map \(\bfx \to \bfx'\) linear or non-linear?

  • Selection vs extraction: are features selected or extracted? \[\bfx'_{\textsf{select}}=\mat{c}{x_1\\x_6\\x_{32}}\quad\textsf{vs}\quad\bfx'_{\textsf{extract}}=\mat{c}{\phi_1(\bfx)\\\phi_2(\bfx)\\\phi_3(\bfx)}\]

  • We will discuss several examples
    • Filter methods: supervised selection method
    • Fisher discriminant analysis: supervised extraction method
    • Principal component analysis: unsupervised extraction method