Ex. 10.10
Ex. 10.10
Show that for class classification, only one tree needs to be grown at each gradient-boosting iteration.
Soln. 10.10
For classification the loss function (for a single training sample) is the multinomial deviance
where
Then least square trees are constructed at each iteration, with each tree is fit to its respective negative gradient
When , we have . When we build the first least square tree , it is fit to
which is the negative of the target of the second tree :
To see that, note
Therefore, once we build , we can flip the sign and get .