Ex. 10.10
Ex. 10.10
Show that for \(K=2\) class classification, only one tree needs to be grown at each gradient-boosting iteration.
Soln. 10.10
For classification the loss function (for a single training sample) is the multinomial deviance
\[\begin{eqnarray}
L(y, p(x)) &=& -\sum_{k=1}^K\bb{1}(y=G_k)\log p_k(x)\non\\
&=&-\sum_{k=1}^K\bb{1}(y=G_k)f_k(x) +\log\left(\sum_{l=1}^Ke^{f_l(x)}\right),\non
\end{eqnarray}\]
where
\[\begin{equation}
p_k(x) = \frac{e^{f_k(x)}}{\sum_{l=1}^Ke^{f_l(x)}}.\non
\end{equation}\]
Then \(K\) least square trees are constructed at each iteration, with each tree is fit to its respective negative gradient
\[\begin{equation}
\bb{1}(y=G_k) - p_k(x) \text{ for } k=1,...,K.\non
\end{equation}\]
When \(K=2\), we have \(p_1(x) + p_2(x) = 1\). When we build the first least square tree \(T_1\), it is fit to
\[\begin{equation}
\bb{1}(y=G_1) - p_1(x),\non
\end{equation}\]
which is the negative of the target of the second tree \(T_2\):
\[\begin{equation}
\bb{1}(y=G_2) - p_2(x).\non
\end{equation}\]
To see that, note
\[\begin{equation}
\bb{1}(y=G_1) - p_1(x) = (1-\bb{1}(y=G_2)) - (1-p_2(x)) = -(\bb{1}(y=G_2) - p_2(x)).\non
\end{equation}\]
Therefore, once we build \(T_1\), we can flip the sign and get \(T_2\).