Ex. 10.9

Consider a \(K\)-class problem where that targets \(y_{ik}\) are coded as 1 if observation \(i\) is in class \(k\) and zero otherwise. Using the multinomial deviance loss function (10.22) and the symmetric logistic transform, use the arguments leading to the gradient boosting Algorithm 10.3 to derive Algorithm 10.4.

Hint: See Ex. 10.8 for step 2(b) iii.

Soln. 10.9

It's enough to show following items:

(1) From (10.21) in the textbook we get Algorithm 10.4 (a).

(2) From Table 10.2 and Algorithm 10.3 (a) in the textbook we get Algorithm 10.4 (b) i.

(3) From Ex. 10.8 (c) we get Algorithm 10.4 (b) iii. To see that, it suffices to show that

\[\begin{equation} \sum_{x_i\in R_{ikm}}|r_{ikm}|(1-|r_{ikm}|) = \sum_{x_{ikm}\in R_{ikm}}p_{k}(x_i)(1-p_k(x_i)).\non \end{equation}\]

Note that when \(y_{ik} = 1\), \(r_{ikm} = 1 - p_k(x_i)\) and when \(y_{ik} = 0\), \(r_{ikm} = - p_k(x_i)\). Therefore in both cases, we have

\[\begin{equation} |r_{ikm}|(1-|r_{ikm}|) = p_{k}(x_i)(1-p_k(x_i)).\non \end{equation}\]