Ex. 18.12

Ex. 18.12

Suppose we wish to select the ridge parameter λ by 10-fold cross-validation in a pN situation (for any linear model). We wish to use the computational shortcuts described in Section 18.3.5. Show that we need only to reduce the N×p matrix X to the N×N matrix R once, and can use it in all the cross-validation runs.

Soln. 18.12

The N×N matrix R is constructed via SVD of X in (18.13). For each observation xi,i=1,...,N, (18.13) defines a corresponding ri,i=1,...,N.

To perform 10-fold cross-validation, we divide the training sample X into 10 subsets Ni,i=1,...,10 with size N/10. Correspondingly, we divide the matrix R into 10 subsets with the same division indices as X. We separate each subset Ni aside and train on the remaining subsets. Recall the theorem described in (18.16)-(18.17) in the text, each training session (indexed by j=1,...,10) essential becomes solving

argminβ0,βiNjL(yi,β0+xiTβ)+λβTβ,

which has the same optimal solution if we solve for ri for iNj like (18.17). Therefore, we only need to construct R once.