Ex. 13.3
Ex. 13.3
Let \(E^\ast\) be the error rate of the Bayes rule in a \(K\)-class problem, where the true class probabilities are given by \(p_k(x)\), \(k=1,...,K\). Assuming the test point and training point have identical features \(x\), prove (13.5)
where \(k^\ast = \underset{k}{\operatorname{argmax}}p_k(x)\). Hence argue that the error rate of the 1-nearest-neighbor rule converge in \(L_1\), as the size of the training set increases, to a value \(E_1\), bounded above by
[This statement of the theorem of \cite{cover1967nearest} is taken from Chapter 6 of \cite{ripley1996pattern}, where a short proof is also given].
Soln. 13.3
We write
and we need to minimize \(\sum_{i\neq k^\ast}^K p_i(x)^2\). It's easy to see (consider its Lagrangian) that it's minimized when each \(p_i(x)\) for \(i\neq k^\ast\) are equal. With the constraint that \(\sum_{i=1}^Kp_i(x)=1\), we get \(p_i(x) = \frac{1}{K-1}(1-p_{k^\ast}(x))\) for \(i\neq k^\ast\). Plug it into the equation with simple algebra we obtain
By definition we have
and
Therefore, by \(\eqref{eq:13-3b}\) we obtain