Ex. 7.2

For 0-1 loss with \(Y\in \{0, 1\}\) and \(\text{Pr}(Y=1|x_0) = f(x_0)\), show that

\[\begin{eqnarray} \text{Err}(x_0) &=& \text{Pr}(Y\neq \hat G(x_0)|X=x_0)\non\\ &=&\text{Err}_B(x_0) + |2f(x_0)-1|\text{Pr}(\hat G(x_0)\neq G(x_0)|X = x_0),\non \end{eqnarray}\]

where \(\hat G(x) = \bb{1}(\hat f(x) > 1/2)\), \(G(x) = \bb{1}(f(x) > 1/2)\) is the Bayes classifier, and \(\text{Err}_B(x_0) = \text{Pr}(Y\neq G(x_0)|X=x_0)\), the irreducible Bayes error at \(x_0\). Using the approximation \(\hat f(x_0)\sim N(E\hat f(x_0), \text{Var}(\hat f(x_0))\), show that

\[\begin{equation} \text{Pr}(Y\neq \hat G(x_0)|X=x_0) \approx \Phi\left(\frac{\text{sign}(\frac{1}{2}-f(x_0))(E\hat f(x_0) - \frac{1}{2})}{\sqrt{\text{Var}(\hat f(x_0))}}\right).\non \end{equation}\]

In the above

\[\begin{equation} \Phi(t) = \frac{1}{\sqrt{2\pi}}\int_\infty^t\exp(-t^2/2)dt,\non \end{equation}\]

the cumulative Gaussian distribution function, This is an increasing function, with value 0 at \(t=-\infty\) and value 1 at \(t=+\infty\).

We can think of \(\text{sign}(\frac{1}{2}-f(x_0))(E\hat f(x_0) - \frac{1}{2})\) as a kind of boundary-bias term, as it depends on the true \(f(x_0)\) only through which side of the boundary \((\frac{1}{2})\) that it lies. Notice also that the bias and variance combine in a multiplicative rather than additive fashion. If \(E\hat f(x_0)\) is on the same side of \((\frac{1}{2})\), then the bias is negative, and decreasing the variance will decrease the misclassification error. On the other hand, if \(E\hat f(x_0)\) is on the opposite side of \((\frac{1}{2})\) to \(f(x_0)\), then the bias is positive and it pays to increase the variance! Such an increase will improve the chance that \(\hat f(x_0)\) falls on the correct side of \((\frac{1}{2})\) (On bias, variance, 0/1—loss, and the curse-of-dimensionality).

Soln. 7.2

First consider the case when \(f(x_0) \ge 1/2\), we have \(G(x_0) = 1\), and

\[\begin{eqnarray} \text{Err}(x_0) &=& \text{Pr}(Y \neq \hat G(x_0)|X=x_0)\non\\ &=&\text{Pr}(Y=1|X=x_0)\text{Pr}(\hat G(x_0) =0|X=x_0)\non\\ && + \ \text{Pr}(Y=0|X=x_0)\text{Pr}(\hat G(x_0) =1|X=x_0)\non\\ &=&f(x_0) \text{Pr}(\hat G(x_0) =0|X=x_0)\non\\ && + \ (1-f(x_0))(1-\text{Pr}(\hat G(x_0) =0|X=x_0))\non\\ &=&1-f(x_0) + (2f(x_0)-1)\text{Pr}(\hat G(x_0) =0|X=x_0)\non\\ &=&\text{Err}_B(x_0) + |2f(x_0)-1|\text{Pr}(\hat G(x_0)\neq G(x_0)|X = x_0).\non \end{eqnarray}\]

Similar arguments hold for the case when \(f(x_0) < 1/2\) and \(G(x_0) = 0\). Therefore, we have showed

\[\begin{equation} \text{Err}(x_0) = \text{Err}_B(x_0) + |2f(x_0)-1|\text{Pr}(\hat G(x_0)\neq G(x_0)|X = x_0).\non \end{equation}\]

For the second part, again, we first consider the case when \(f(x_0) \ge 1/2\) (thus \(G(x_0) = 1\)). In such case, we have

\[\begin{eqnarray} \text{Pr}(\hat G(x_0)\neq G(x_0)|X=x_0) &=& \text{Pr}(\hat G(x_0) = 0| X= x_0)\non\\ &=&\text{Pr}(\hat f(x_0) < \frac{1}{2})\non\\ &=&\text{Pr}\left(\frac{\hat f(x_0) - E\hat f(x_0)}{\sqrt{\text{Var}(\hat f(x_0))}} < \frac{\frac{1}{2} - E\hat f(x_0)}{\sqrt{\text{Var}(\hat f(x_0))}}\right)\non\\ &\approx&\Phi\left(\frac{\text{sign}(\frac{1}{2}-f(x_0))(E\hat f(x_0) - \frac{1}{2})}{\sqrt{\text{Var}(\hat f(x_0))}}\right).\non \end{eqnarray}\]

Similar arguments hold for the case when \(f(x_0) \ge 1/2\) as well.