Ex. 18.13

Suppose our \(p>N\) predictors are presented as an \(N\times N\) inner-product matrix \(\bb{K}=\bb{X}\bb{X}^T\), and we wish to fit the equivalent of a linear logistic regression model in the original features with quadratic regularization. Our predictions are also to be made using inner products; a new \(x_0\) is presented as \(k_0=\bX x_0\). Let \(\bb{K}=\bU\bD^2\bU^T\) be the eigen-decomposition of \(\bb{K}\). Show that the predictions are given by \(\hat f_0 = k_0^T\hat\alpha\), where

(a) \(\hat\alpha = \bU\bD^{-1}\hat\beta\), and

(b) \(\hat\beta\) is the ridged logistic regression estimate with input matrix \(\bb{R}=\bU\bD\).

Argue that the same approach can be used for any appropriate kernel matrix \(\bb{K}\).

Soln. 18.13

Let \(\hat\alpha\) and \(\hat\beta\) be defined as in (a) and (b). Note that

\[\begin{eqnarray} \hat f_0 &=& k_0^T\hat\alpha\non\\ &=& x_0^T\bb{X}^T\bU\bD^{-1}\hat\beta\non\\ &=&x_0^T\bb{V}\bb{D}\bb{U}^T\bb{U}\bb{D}^{-1}\hat\beta\non\\ &=&x_0^T\bb{V}\hat\beta,\non \end{eqnarray}\]

which is exact the same as prediction given by (18.15) in the text (see Ex. 18.4 for proof).

For any kernel matrix \(\bb{K}\), the proof follows the same logic except that we replace the explicit \(\bb{X}\bb{X}^T\) by \(\bb{K}\). See Ex. 5.16 for more details.