Ex. 10.10

Ex. 10.10

Show that for K=2 class classification, only one tree needs to be grown at each gradient-boosting iteration.

Soln. 10.10

For classification the loss function (for a single training sample) is the multinomial deviance

L(y,p(x))=k=1K1(y=Gk)logpk(x)=k=1K1(y=Gk)fk(x)+log(l=1Kefl(x)),

where

pk(x)=efk(x)l=1Kefl(x).

Then K least square trees are constructed at each iteration, with each tree is fit to its respective negative gradient

1(y=Gk)pk(x) for k=1,...,K.

When K=2, we have p1(x)+p2(x)=1. When we build the first least square tree T1, it is fit to

1(y=G1)p1(x),

which is the negative of the target of the second tree T2:

1(y=G2)p2(x).

To see that, note

1(y=G1)p1(x)=(11(y=G2))(1p2(x))=(1(y=G2)p2(x)).

Therefore, once we build T1, we can flip the sign and get T2.