Matthieu R Bloch
Dataset \(\{\bfx_i,y_i\}_{i=1}^N\) is linearly separable if there exists \(\bfw\in\bbR^d\) and \(b\in\bbR\) such that \[\forall i\in\intseq{1}{N}\quad y_i=\sgn{\bfw^\intercal\bfx+b} \qquad y_i\in \{\pm 1\}\] By definition \(\sgn{x}=+1\) if \(x> 0\) and \(-1\) else. The affine set \(\{\bfx:\bfw^\intercal\bfx+b=0\}\) is called a separating hyperplane.
Consider the hyperplane \(\calH\eqdef \{\bfx:\bfw^\intercal\bfx+b=0\}\). The vector \(\bfw\) is orthogonal to all vectors parallel to the hyperplane. For \(\bfz\in\bbR^d\), the distance of \(\bfz\) to the hyperplane is \[d(\bfz,\calH)=\frac{\abs{\bfw^\intercal\bfz+b }}{\norm[2]{\bfw}}\]
Data \(\calD\eqdef\{(\bfx_i,y_i)\}_{i=1}^N\) such that \(y_i\in\{\pm 1\}\)
Perceptron Learning Algorithm (PLA) invented by Rosenblatt in 1958 to find separating hyperplanes
PLA finds a separating hyperplane if the data is linearly separable
“All separating hyperplanes are equal but some are more equal than others”
The canonical form \((\bfw,b)\) of a separating plane is such that \[\forall i\,\, y_i(\bfw^\intercal\bfx_i+b) \geq 1\text{ and }\exists i^* \text{ s.t. }y_{i^*}(\bfw^\intercal\bfx_{i^*}+b) = 1\]
The optimal soft-margin hyperplane is the solution of the following \[\argmin_{\bfw,b,\boldsymbol{\xi}}\frac{1}{2}\norm[2]{\bfw}^2+\frac{C}{N}\sum_{i=1}^N\xi_i\text{ s.t. }\forall i\quad y_i(\bfw^\intercal\bfx_i+b)\geq 1-\xi_i\text{ and }\xi_i\geq 0\]
\(C>0\) is a cost set by the user, which controls the influence of outliers