Fisher Linear Discriminant Analysis

Matthieu Bloch

Fisher LDA

Fisher LDA is a supervised dimensionality reduction technique
Consider a dataset \(\set{(\bfx_i,y_i)}_{i=1}^N\) with \(K\) classes \(k\in\intseq{1}{K}\).
- Define mean feature vector \(\bar{\bfx}\eqdef\frac{1}{N}\sum_{i=1}^N \bfx_i\)
- Define mean class \(k\) feature vector \(\bfmu_k\eqdef \frac{1}{N_k}\sum_{i=1}^N \bfx_i\indic{y_i=k}\).
Fisher LDA attempts to find dimensions that best discriminate the labels by maximizing the following objective \[ J(\bfw) = \frac{\bfw^\intercal S_B\bfw}{\bfw^\intercal S_W\bfw} \] with \[ S_B\eqdef \sum_{k=1}^K (\bfmu_k-\bar{x})(\bfmu_k-\bar{x})^\intercal \textsf{ and } S_W\eqdef \sum_{k=1}^K \sum_{i=1}^N \indic{y_i=k}(\bfx_i-\bfmu_k)(\bfx_i-\bfmu_k)^\intercal \]
\(S_B\) is called the between scattering matrix
\(S_W\) is called the within scattering matrix

The dimension that maximizes \(J(\bfw)\) is an eigenvector associated to the largest eigenvalue of \[ S_B^{\frac{1}{2}}S_W^{-1}S_B^{\frac{1}{2}} \]