Matthieu R Bloch
Back to our training dataset \(\calD\eqdef\{(\bfx_1,y_1),\cdots,(\bfx_N,y_N)\}\)
The nearest-neighbor (NN) classifier is \(h^{\text{NN}}(\bfx)\eqdef y_{\text{NN}(\bfx)}\) where \(\text{NN}(\bfx)\eqdef \argmin_i \norm{\bfx_i-\bfx}\)
Let \(\bfx\), \(\{\bfx_i\}_{i=1}^N\) be i.i.d. \(\sim P_{\bfx}\) in a separable metric space \(\calX\). Let \(\bfx_{\text{NN}(\bfx)}\) be the nearest neighbor of \(\bfx\). Then \(\bfx_{\text{NN}(\bfx)} \to \bfx\) with probability one as \(N\to\infty\)
Let \(\calX\) be a separable metric space. Let \(p(\bfx|y=0)\), \(p(\bfx|y=1)\) be such that, with probability one, \(\bfx\) is either a continuity point of \(p(\bfx|y=0)\) and \(p(\bfx|y=1)\) or a point of non-zero probability measure. As \(N\to\infty\), \[R(h^{\text{B}}) \leq R(h^{\text{NN}})\leq 2R(h^{\text{B}})(1-R(h^{\text{B}}))\]
Let \(\hat{h}_N\) be a classifier learned from \(N\) data points; \(\hat{h}_N\) is consistent if \(\E{R(\hat{h}_N)}\to R_B\) as \(N\to\infty\).
If \(N\to\infty\), \(K\to\infty\), \(K/N\to 0\), then \(h^{\text{$K$-NN}}\) is consistent
Given enough data, a \(K\)-NN classifier will do just as well as pretty much any other method
The number of samples \(N\) required can be huge (especially in high-dimension)