Ex. 5.15

Ex. 5.15

This exercise derives some of the results quoted in Section 5.8.1. Suppose \(K(x,y)\) satisfying the conditions (5.45) and let \(f(x)\in \mathcal{H}_K\). Show that

(a) \(\langle K(\cdot, x_i), f\rangle_{\mathcal{H}_K} = f(x_i)\).

(b) \(\langle K(\cdot, x_i), K(\cdot, x_j)\rangle_{\mathcal{H}_K} = K(x_i, x_j)\).

(c) If \(g(x) = \sum_{i=1}^N\alpha_iK(x,x_i)\), then

\[\begin{equation} J(g) = \sum_{i=1}^N\sum_{i=1}^NK(x_i, x_j)\alpha_i\alpha_j.\non \end{equation}\]

Suppose that \(\tilde g(x) = g(x) + \rho(x)\), with \(\rho(x) \in \mathcal{H}_K\), and orthogonal in \(\mathcal{H}_K\) to each of \(K(x, x_i)\), i=1,...,N. Show that

(d)

\[\begin{equation} \sum_{i=1}^NL(y_i, \tilde g(x_i)) + \lambda J(\tilde g) \ge \sum_{i=1}^NL(y_i, g(x_i)) + \lambda J(g)\non \end{equation}\]

with equality iff \(\rho(x) = 0\).

Soln. 5.15

(a) Note that by (5.47) in text, the inner product \(\mathcal{H}_K\) is

\[\begin{equation} \left\langle \sum_{j\in J}a_j\phi_i(x), \sum_{j\in J}b_j\phi_j(x) \right\rangle_{\mathcal{H}_K} = \sum_{j\in J}\frac{a_jb_j}{\lambda_j}.\non \end{equation}\]

Therefore, by definition of \(K\) we have

\[\begin{eqnarray} \langle K(\cdot, y), f\rangle_{\mathcal{H}_K} &=& \left\langle \sum_{i=1}^\infty (\gamma_i\phi_i(x))\phi_i(y), \sum_{i=1}^\infty c_i\phi_i(x)\right\rangle \non\\ &=&\sum_{i=1}^\infty \frac{c_i\lambda_i\phi_i(y)}{\lambda_i}\non\\ &=&f(y).\non \end{eqnarray}\]

(b) It follows from (a) by letting \(f(\cdot) = K(\cdot, x_j)\).

(c) From (b) we have

\[\begin{eqnarray} J(g) &=& \left \langle \sum_{i=1}^N\alpha_iK(x, x_i), \sum_{i=1}^N\alpha_iK(x, x_i) \right\rangle\non\\ &=& \sum_{i=1}^N\sum_{i=1}^NK(x_i, x_j)\alpha_i\alpha_j.\non \end{eqnarray}\]

(d) Since \(\rho\) is orthogonal to each \(K(x, x_i)\) for \(i=1,...,N\), we have

\[\begin{equation} \lambda J(\tilde g) = \lambda J(g) + \lambda\|\rho\|^2_{\mathcal{H}_K} \ge \lambda J(g).\non \end{equation}\]

Moreover, from (a), we have

\[\begin{eqnarray} \tilde g(x_i) &=& \langle K(\cdot, x_i), \tilde g \rangle_{\mathcal{H}_K}\non\\ &=& \langle K(\cdot, x_i), g + \rho \rangle_{\mathcal{H}_K}\non\\ &=& \langle K(\cdot, x_i), g \rangle_{\mathcal{H}_K},\non \end{eqnarray}\]

so that

\[\begin{equation} L(y_i, \tilde g(x_i)) = L(y_i, g(x_i)),\non \end{equation}\]

that is, the loss only depends on the data space.

The proof is now complete.

Remark

This is the (simple version of) Representer Theorem.