Ex. 13.3

Ex. 13.3

Let \(E^\ast\) be the error rate of the Bayes rule in a \(K\)-class problem, where the true class probabilities are given by \(p_k(x)\), \(k=1,...,K\). Assuming the test point and training point have identical features \(x\), prove (13.5)

\[\begin{equation} \sum_{k=1}^Kp_k(x)(1-p_k(x)) \le 2(1-p_{k^\ast}(x))-\frac{K}{K-1}(1-p_{k^\ast}(x))^2,\non \end{equation}\]

where \(k^\ast = \underset{k}{\operatorname{argmax}}p_k(x)\). Hence argue that the error rate of the 1-nearest-neighbor rule converge in \(L_1\), as the size of the training set increases, to a value \(E_1\), bounded above by

\[\begin{equation} E^\ast\left(2-E^\ast\frac{K}{K-1}\right).\non \end{equation}\]

[This statement of the theorem of \cite{cover1967nearest} is taken from Chapter 6 of \cite{ripley1996pattern}, where a short proof is also given].

Soln. 13.3

We write

\[\begin{equation} \sum_{k=1}^Kp_k(x)(1-p_k(x)) = 1 -\sum_{k=1}^Kp_k(x)^2 = 1-p_{k^\ast}(x)^2 - \sum_{i\neq k^\ast}^K p_i(x)^2,\non \end{equation}\]

and we need to minimize \(\sum_{i\neq k^\ast}^K p_i(x)^2\). It's easy to see (consider its Lagrangian) that it's minimized when each \(p_i(x)\) for \(i\neq k^\ast\) are equal. With the constraint that \(\sum_{i=1}^Kp_i(x)=1\), we get \(p_i(x) = \frac{1}{K-1}(1-p_{k^\ast}(x))\) for \(i\neq k^\ast\). Plug it into the equation with simple algebra we obtain

\[\begin{eqnarray} \sum_{k=1}^Kp_k(x)(1-p_k(x)) &\le& 1-p_{k^\ast}(x)^2 - (K-1)\left(\frac{1}{K-1}(1-p_{k^\ast}(x))\right)^2\non\\ &=&1-p_{k^\ast}(x)^2 - \frac{(1-p_{k^\ast}(x))^2}{K-1}\non\\ %&=&1-p_{k^\ast}(x)^2 + (1-p_{k^\ast}(x))^2 - (1-p_{k^\ast}(x))^2 - \frac{(1-p_{k^\ast}(x))^2}{K-1}\non\\ &=&2(1-p_{k^\ast}(x)) - \frac{K}{K-1}(1-p_{k^\ast}(x))^2.\label{eq:13-3b} \end{eqnarray}\]

By definition we have

\[\begin{equation} E_1 = \int \left[\sum_{k=1}^Kp_k(x)(1-p_k(x))\right]p(x)dx\non \end{equation}\]

and

\[\begin{equation} E^\ast = \int \left[1-p_{k^\ast}(x)\right]p(x)dx.\non \end{equation}\]

Therefore, by \(\eqref{eq:13-3b}\) we obtain

\[\begin{equation} E_1 \le E^\ast\left(2-E^\ast\frac{K}{K-1}\right).\non \end{equation}\]