Ex. 5.15
Ex. 5.15
This exercise derives some of the results quoted in Section 5.8.1. Suppose \(K(x,y)\) satisfying the conditions (5.45) and let \(f(x)\in \mathcal{H}_K\). Show that
(a) \(\langle K(\cdot, x_i), f\rangle_{\mathcal{H}_K} = f(x_i)\).
(b) \(\langle K(\cdot, x_i), K(\cdot, x_j)\rangle_{\mathcal{H}_K} = K(x_i, x_j)\).
(c) If \(g(x) = \sum_{i=1}^N\alpha_iK(x,x_i)\), then
\[\begin{equation}
J(g) = \sum_{i=1}^N\sum_{i=1}^NK(x_i, x_j)\alpha_i\alpha_j.\non
\end{equation}\]
Suppose that \(\tilde g(x) = g(x) + \rho(x)\), with \(\rho(x) \in \mathcal{H}_K\), and orthogonal in \(\mathcal{H}_K\) to each of \(K(x, x_i)\), i=1,...,N. Show that
(d)
\[\begin{equation}
\sum_{i=1}^NL(y_i, \tilde g(x_i)) + \lambda J(\tilde g) \ge \sum_{i=1}^NL(y_i, g(x_i)) + \lambda J(g)\non
\end{equation}\]
with equality iff \(\rho(x) = 0\).
Soln. 5.15
(a) Note that by (5.47) in text, the inner product \(\mathcal{H}_K\) is
\[\begin{equation}
\left\langle \sum_{j\in J}a_j\phi_i(x), \sum_{j\in J}b_j\phi_j(x) \right\rangle_{\mathcal{H}_K} = \sum_{j\in J}\frac{a_jb_j}{\lambda_j}.\non
\end{equation}\]
Therefore, by definition of \(K\) we have
\[\begin{eqnarray}
\langle K(\cdot, y), f\rangle_{\mathcal{H}_K} &=& \left\langle \sum_{i=1}^\infty (\gamma_i\phi_i(x))\phi_i(y), \sum_{i=1}^\infty c_i\phi_i(x)\right\rangle \non\\
&=&\sum_{i=1}^\infty \frac{c_i\lambda_i\phi_i(y)}{\lambda_i}\non\\
&=&f(y).\non
\end{eqnarray}\]
(b) It follows from (a) by letting \(f(\cdot) = K(\cdot, x_j)\).
(c) From (b) we have
\[\begin{eqnarray}
J(g) &=& \left \langle \sum_{i=1}^N\alpha_iK(x, x_i), \sum_{i=1}^N\alpha_iK(x, x_i) \right\rangle\non\\
&=& \sum_{i=1}^N\sum_{i=1}^NK(x_i, x_j)\alpha_i\alpha_j.\non
\end{eqnarray}\]
(d) Since \(\rho\) is orthogonal to each \(K(x, x_i)\) for \(i=1,...,N\), we have
\[\begin{equation}
\lambda J(\tilde g) = \lambda J(g) + \lambda\|\rho\|^2_{\mathcal{H}_K} \ge \lambda J(g).\non
\end{equation}\]
Moreover, from (a), we have
\[\begin{eqnarray}
\tilde g(x_i) &=& \langle K(\cdot, x_i), \tilde g \rangle_{\mathcal{H}_K}\non\\
&=& \langle K(\cdot, x_i), g + \rho \rangle_{\mathcal{H}_K}\non\\
&=& \langle K(\cdot, x_i), g \rangle_{\mathcal{H}_K},\non
\end{eqnarray}\]
so that
\[\begin{equation}
L(y_i, \tilde g(x_i)) = L(y_i, g(x_i)),\non
\end{equation}\]
that is, the loss only depends on the data space.
The proof is now complete.
Remark
This is the (simple version of) Representer Theorem.