Ex. 18.1

For a coefficient estimate \(\hat\beta_j\), let \(\hat\beta_j/\|\hat\beta_j\|_2\) be the normalized version. Show that as \(\lambda\ra\infty\), the normalized ridge-regression estimates converge to the renormalized partial-least-squares one-component estimates.

Soln. 18.1

Recall the SVD decomposition of \(\bX=\bb{U}\bb{D}\bb{V}^T\). Here \(\bb{U}\) and \(\bb{V}\) are \(N\times N\) and \(p\times N\) orthogonal matrices, and \(\bb{D}=\text{diag}(d_1, d_2,...,d_p)\) is a \(p\times p\) diagonal matrix.

Denote \(\bb{V} = \{v_{ij}\}\) and write

\[\begin{equation} \bb{U} = \begin{pmatrix} \bb{u}_1 & \bb{u}_2 & \dots & \bb{u}_p \end{pmatrix}. \non \end{equation}\]

We have

\[\begin{eqnarray} \hat\beta &=& (\bX^T\bX + \lambda\bI)^{-1}\bX^T\by\non\\ &=&\left(\bb{V}\bb{D}^2\bb{V}^T + \lambda\bI\right)^{-1}\bb{V}\bb{D}\bb{U}^T\by\non\\ &=&\left(\bb{V}(\bb{D}^2 + \lambda\bI\right)\bb{V}^T)^{-1}\bb{V}\bb{D}\bb{U}^T\by\non\\ &=&\bb{V}^T(\bb{D}^2 + \lambda\bI)^{-1}\bb{D}\bb{U}^T\by.\non \end{eqnarray}\]

Thus we can write

\[\begin{equation} \hat\beta_j = \sum_{i=1}^p\frac{d_iv_{ji}\bb{u}_i^Ty}{d_i^2+\lambda}.\non \end{equation}\]

On the other hand,

\[\begin{eqnarray} \|\hat\beta\|_2^2 &=& \by^T\bb{U}\bb{D}(\bb{D}^2+\lambda\bI)^{-1}(\bb{D}^2+\lambda\bI)^{-1}\bb{D}\bb{U}^T\by\non\\ &=&(\bb{U}^T\by)^T[\bb{D}(\bb{D}^2+\lambda\bI)^{-2}\bb{D}](\bb{U}^T\by)\non\\ &=&\sum_{j=1}^p\frac{d_j^2(\bb{U}^T\by)_j^2}{(d_j^2 + \lambda)^2}.\non \end{eqnarray}\]

where \(\bb{D}(\bb{D}^2+\lambda\bI)^{-2}\bb{D}\) represents a diagonal matrix with elements \(\frac{d_j^2}{(d_j^2 + \lambda)^2}\).

Thus, as \(\lambda\ra\infty\), we have

\[\begin{eqnarray} \frac{\hat\beta_j}{\|\hat\beta_j\|_2} &=& \frac{\sum_{i=1}^p\frac{d_iv_{ji}\bb{u}_i^Ty}{d_i^2+\lambda}}{\frac{d_j(\bb{u}_j^T\by)}{(d_j^2 + \lambda)}}\non\\ &=&\frac{1}{d_j\bb{u}_j\by}\cdot \sum_{i=1}^p(d_iv_{ji}\bb{u}_i^T\by)\frac{d_j^2+\lambda}{d_i^2+\lambda}\non\\ &\ra& \frac{1}{d_j\bb{u}_j^T\by}\cdot \sum_{i=1}^p(d_iv_{ji}\bb{u}_i^T\by)\non\\ &=&\frac{1}{d_j\bb{u}^T_j\by} (\bb{V}\bb{D}\bb{U}^T\by)_j\non\\ &=&\frac{(\bX^T\by)_j}{d_j\bb{u}_j^T\by},\non \end{eqnarray}\]

which is re-normalized partial-least-squares one-component estimate \(\bX^T\by\).