Ex. 18.12

Suppose we wish to select the ridge parameter \(\lambda\) by 10-fold cross-validation in a \(p\gg N\) situation (for any linear model). We wish to use the computational shortcuts described in Section 18.3.5. Show that we need only to reduce the \(N\times p\) matrix \(\bX\) to the \(N\times N\) matrix \(\bR\) once, and can use it in all the cross-validation runs.

Soln. 18.12

The \(N\times N\) matrix \(\bb{R}\) is constructed via SVD of \(\bb{X}\) in (18.13). For each observation \(x_i, i=1,...,N\), (18.13) defines a corresponding \(r_i, i=1,...,N\).

To perform 10-fold cross-validation, we divide the training sample \(\bb{X}\) into 10 subsets \(N_i, i=1,...,10\) with size \(N/10\). Correspondingly, we divide the matrix \(\bb{R}\) into 10 subsets with the same division indices as \(\bb{X}\). We separate each subset \(N_i\) aside and train on the remaining subsets. Recall the theorem described in (18.16)-(18.17) in the text, each training session (indexed by \(j=1,...,10\)) essential becomes solving

\[\begin{equation} \underset{\beta_0, \beta}{\operatorname{argmin}}\sum_{i\notin N_j}L(y_i, \beta_0+x_i^T\beta) + \lambda \beta^T\beta,\non \end{equation}\]

which has the same optimal solution if we solve for \(r_i\) for \(i\notin N_j\) like (18.17). Therefore, we only need to construct \(\bb{R}\) once.