Ex. 18.1
Ex. 18.1
For a coefficient estimate \(\hat\beta_j\), let \(\hat\beta_j/\|\hat\beta_j\|_2\) be the normalized version. Show that as \(\lambda\ra\infty\), the normalized ridge-regression estimates converge to the renormalized partial-least-squares one-component estimates.
Soln. 18.1
Recall the SVD decomposition of \(\bX=\bb{U}\bb{D}\bb{V}^T\). Here \(\bb{U}\) and \(\bb{V}\) are \(N\times N\) and \(p\times N\) orthogonal matrices, and \(\bb{D}=\text{diag}(d_1, d_2,...,d_p)\) is a \(p\times p\) diagonal matrix.
Denote \(\bb{V} = \{v_{ij}\}\) and write
\[\begin{equation}
\bb{U} = \begin{pmatrix}
\bb{u}_1 &
\bb{u}_2 &
\dots &
\bb{u}_p
\end{pmatrix}.
\non
\end{equation}\]
We have
\[\begin{eqnarray}
\hat\beta &=& (\bX^T\bX + \lambda\bI)^{-1}\bX^T\by\non\\
&=&\left(\bb{V}\bb{D}^2\bb{V}^T + \lambda\bI\right)^{-1}\bb{V}\bb{D}\bb{U}^T\by\non\\
&=&\left(\bb{V}(\bb{D}^2 + \lambda\bI\right)\bb{V}^T)^{-1}\bb{V}\bb{D}\bb{U}^T\by\non\\
&=&\bb{V}^T(\bb{D}^2 + \lambda\bI)^{-1}\bb{D}\bb{U}^T\by.\non
\end{eqnarray}\]
Thus we can write
\[\begin{equation}
\hat\beta_j = \sum_{i=1}^p\frac{d_iv_{ji}\bb{u}_i^Ty}{d_i^2+\lambda}.\non
\end{equation}\]
On the other hand,
\[\begin{eqnarray}
\|\hat\beta\|_2^2 &=& \by^T\bb{U}\bb{D}(\bb{D}^2+\lambda\bI)^{-1}(\bb{D}^2+\lambda\bI)^{-1}\bb{D}\bb{U}^T\by\non\\
&=&(\bb{U}^T\by)^T[\bb{D}(\bb{D}^2+\lambda\bI)^{-2}\bb{D}](\bb{U}^T\by)\non\\
&=&\sum_{j=1}^p\frac{d_j^2(\bb{U}^T\by)_j^2}{(d_j^2 + \lambda)^2}.\non
\end{eqnarray}\]
where \(\bb{D}(\bb{D}^2+\lambda\bI)^{-2}\bb{D}\) represents a diagonal matrix with elements \(\frac{d_j^2}{(d_j^2 + \lambda)^2}\).
Thus, as \(\lambda\ra\infty\), we have
\[\begin{eqnarray}
\frac{\hat\beta_j}{\|\hat\beta_j\|_2} &=& \frac{\sum_{i=1}^p\frac{d_iv_{ji}\bb{u}_i^Ty}{d_i^2+\lambda}}{\frac{d_j(\bb{u}_j^T\by)}{(d_j^2 + \lambda)}}\non\\
&=&\frac{1}{d_j\bb{u}_j\by}\cdot \sum_{i=1}^p(d_iv_{ji}\bb{u}_i^T\by)\frac{d_j^2+\lambda}{d_i^2+\lambda}\non\\
&\ra& \frac{1}{d_j\bb{u}_j^T\by}\cdot \sum_{i=1}^p(d_iv_{ji}\bb{u}_i^T\by)\non\\
&=&\frac{1}{d_j\bb{u}^T_j\by} (\bb{V}\bb{D}\bb{U}^T\by)_j\non\\
&=&\frac{(\bX^T\by)_j}{d_j\bb{u}_j^T\by},\non
\end{eqnarray}\]
which is re-normalized partial-least-squares one-component estimate \(\bX^T\by\).