Ex. 12.1
Ex. 12.1
Show that the criteria (12.25) and (12.8) are equivalent.
Soln. 12.1
For (12.8), the problem (denoted as \(P_1\)) is
For (12.25), the problem (denoted as \(P'_2\)) with \(\lambda=\frac{1}{C}\) is
The objective functions for both problems are the same, up to constant \(\frac{1}{C}\). Without loss of generality, we rewrite \(P'_2\) as \(P_2\):
For any optimal solution \(\{\hat\beta, \hat\beta_0, \hat\eta_i\}\) of \(P_2\), it also satisfies the constraints in \(P_1\), because \(\hat\eta_i=[1-y_i\hat f(x_i)]_+\ge 0\) and \(\hat\eta_i=[1-y_i\hat f(x_i)]_+\ge 1-y_i\hat f(x_i)\) for any \(i\). Therefore, we know the optimal value of \(P_2\) is greater than or equal to that of \(P_1\).
To show the other way around, let's go back to (12.10) - (12.16) in the text which uniquely characterize the solution to \(P_1\). Consider an optimal solution \(\{\hat\beta, \hat\beta_0, \hat\xi_i\}\) of \(P_1\). If \(\hat\xi_i > 1-y_i\hat f(x_i)\), then \(\hat\alpha_i=0\) by (12.14), and \(C-\mu_i=0\) by (12.12), so that \(\xi_i\) is eliminated in (12.9). In other words, the \(i\)-th data sample is not a support vector. Such \(\hat\xi_i\) have no impact on the optimal solution, therefore the focus is on \(\hat\xi_i = 1-y_i\hat f(x_i) = [1-y_i\hat f(x_i)]_+ = \hat \eta_i\) since \(\hat \xi_i\ge 0\). This reduces to constraints in \(P_2\).
Therefore, the two problems have the same solution.