Ex. 7.2

Ex. 7.2

For 0-1 loss with Y{0,1} and Pr(Y=1|x0)=f(x0), show that

Err(x0)=Pr(YG^(x0)|X=x0)=ErrB(x0)+|2f(x0)1|Pr(G^(x0)G(x0)|X=x0),

where G^(x)=1(f^(x)>1/2), G(x)=1(f(x)>1/2) is the Bayes classifier, and ErrB(x0)=Pr(YG(x0)|X=x0), the irreducible Bayes error at x0. Using the approximation f^(x0)N(Ef^(x0),Var(f^(x0)), show that

Pr(YG^(x0)|X=x0)Φ(sign(12f(x0))(Ef^(x0)12)Var(f^(x0))).

In the above

Φ(t)=12πtexp(t2/2)dt,

the cumulative Gaussian distribution function, This is an increasing function, with value 0 at t= and value 1 at t=+.

We can think of sign(12f(x0))(Ef^(x0)12) as a kind of boundary-bias term, as it depends on the true f(x0) only through which side of the boundary (12) that it lies. Notice also that the bias and variance combine in a multiplicative rather than additive fashion. If Ef^(x0) is on the same side of (12), then the bias is negative, and decreasing the variance will decrease the misclassification error. On the other hand, if Ef^(x0) is on the opposite side of (12) to f(x0), then the bias is positive and it pays to increase the variance! Such an increase will improve the chance that f^(x0) falls on the correct side of (12) (On bias, variance, 0/1—loss, and the curse-of-dimensionality).

Soln. 7.2

First consider the case when f(x0)1/2, we have G(x0)=1, and

Err(x0)=Pr(YG^(x0)|X=x0)=Pr(Y=1|X=x0)Pr(G^(x0)=0|X=x0)+ Pr(Y=0|X=x0)Pr(G^(x0)=1|X=x0)=f(x0)Pr(G^(x0)=0|X=x0)+ (1f(x0))(1Pr(G^(x0)=0|X=x0))=1f(x0)+(2f(x0)1)Pr(G^(x0)=0|X=x0)=ErrB(x0)+|2f(x0)1|Pr(G^(x0)G(x0)|X=x0).

Similar arguments hold for the case when f(x0)<1/2 and G(x0)=0. Therefore, we have showed

Err(x0)=ErrB(x0)+|2f(x0)1|Pr(G^(x0)G(x0)|X=x0).

For the second part, again, we first consider the case when f(x0)1/2 (thus G(x0)=1). In such case, we have

Pr(G^(x0)G(x0)|X=x0)=Pr(G^(x0)=0|X=x0)=Pr(f^(x0)<12)=Pr(f^(x0)Ef^(x0)Var(f^(x0))<12Ef^(x0)Var(f^(x0)))Φ(sign(12f(x0))(Ef^(x0)12)Var(f^(x0))).

Similar arguments hold for the case when f(x0)1/2 as well.