Ex. 12.1

Ex. 12.1

Show that the criteria (12.25) and (12.8) are equivalent.

Soln. 12.1

For (12.8), the problem (denoted as \(P_1\)) is

\[\begin{eqnarray} \label{eq:12-1a} P_1: &&\min_{\beta, \beta_0}\ \ \ \frac{1}{2} \|\beta\|^2 + C\sum_{i=1}^N\xi_i \non \\ &&\text{subject to} \ \ \ \xi_i\ge 0, \ \xi_i \ge 1 - y_if(x_i) \ \forall i\non \end{eqnarray}\]

For (12.25), the problem (denoted as \(P'_2\)) with \(\lambda=\frac{1}{C}\) is

\[\begin{eqnarray} \label{eq:12-1b} P'_2: &&\min_{\beta, \beta_0}\ \ \ \frac{1}{C}\left[ \frac{1}{2} \|\beta\|^2 + C\sum_{i=1}^N\eta_i \right]\non \\ &&\text{subject to} \ \ \ \eta_i = [1-y_if(x_i)]_+ \ \forall i\non \end{eqnarray}\]

The objective functions for both problems are the same, up to constant \(\frac{1}{C}\). Without loss of generality, we rewrite \(P'_2\) as \(P_2\):

\[\begin{eqnarray} \label{eq:12-1bb} P_2: &&\min_{\beta, \beta_0}\ \ \ \frac{1}{2} \|\beta\|^2 + C\sum_{i=1}^N\eta_i \non \\ &&\text{subject to} \ \ \ \eta_i = [1-y_if(x_i)]_+ \ \forall i\non \end{eqnarray}\]

For any optimal solution \(\{\hat\beta, \hat\beta_0, \hat\eta_i\}\) of \(P_2\), it also satisfies the constraints in \(P_1\), because \(\hat\eta_i=[1-y_i\hat f(x_i)]_+\ge 0\) and \(\hat\eta_i=[1-y_i\hat f(x_i)]_+\ge 1-y_i\hat f(x_i)\) for any \(i\). Therefore, we know the optimal value of \(P_2\) is greater than or equal to that of \(P_1\).

To show the other way around, let's go back to (12.10) - (12.16) in the text which uniquely characterize the solution to \(P_1\). Consider an optimal solution \(\{\hat\beta, \hat\beta_0, \hat\xi_i\}\) of \(P_1\). If \(\hat\xi_i > 1-y_i\hat f(x_i)\), then \(\hat\alpha_i=0\) by (12.14), and \(C-\mu_i=0\) by (12.12), so that \(\xi_i\) is eliminated in (12.9). In other words, the \(i\)-th data sample is not a support vector. Such \(\hat\xi_i\) have no impact on the optimal solution, therefore the focus is on \(\hat\xi_i = 1-y_i\hat f(x_i) = [1-y_i\hat f(x_i)]_+ = \hat \eta_i\) since \(\hat \xi_i\ge 0\). This reduces to constraints in \(P_2\).

Therefore, the two problems have the same solution.