Ex. 8.5
Ex. 8.5
Suggest generalizations of each of the loss functions in Figure 10.4 to more than two classes, and design an appropriate plot to compare them.
Soln. 8.5
Following the idea of Multi-class adaboost (see Ex. 10.5 as well), for a \(K\)-class classification problem, consider the coding \(Y=(Y_1,...,Y_K)^T\) with
Let \(f=(f_1,...,f_K)^T\) with \(\sum_{k=1}^Kf_k=0\). The exponential loss is defined by
Similarly, the multinomial deviance loss is defined by
For misclassification loss, we can further restrict \(f\) to be in the same form as \(Y\), that is,
When \(K=2\), this coincides with the decision boundary \(f(x)=0\). Therefore, we can let the loss be
Similar to misclassification loss, the square error would be
The support vector error is
As for the plot, it suffices to change the \(x\)-axis in Figure 10.4 from \(yf\) to \(Y^Tf\).