Ex. 18.4

Derive the computational formula (18.15) for ridge regression. [Hint: Use the first derivative of the penalized sum-of-squares criterion to show that if \(\lambda > 0\), then \(\hat\beta = \bX^Ts\) for some \(s\in\mathbb{R}^N\).]

Soln. 18.4

By SVD decomposition of \(\bX=\bb{U}\bb{D}\bb{V}^T=\bb{R}\bb{V}^T\), we have \(\bb{V}^T\bb{V}=\bb{I}\).

We need to show that

\[\begin{equation} \label{eq:18-4a} \hat\beta = \bb{V}(\bb{R}^T\bb{R}+\lambda \bb{I})^{-1}\bb{R}^T\by. \end{equation}\]

We know that \(\hat\beta\) solves the equation

\[\begin{equation} -\bX^T(\by-\bX\hat\beta) + \lambda \hat\beta = 0.\non \end{equation}\]

Therefore, it suffices to plug \(\eqref{eq:18-4a}\) into the equation above and verify the equation indeed holds. To that end, let's write

\[\begin{eqnarray} \bX^T(\by-\bX\hat\beta) &=& \bX^T(\by-\bX\bb{V}(\bb{R}^T\bb{R}+\lambda \bb{I})^{-1}\bb{R}^T\by) \non\\ &=& \bb{V}\bb{R}^T(\by - \bR\bV^T\bb{V}(\bb{R}^T\bb{R}+\lambda \bb{I})^{-1}\bb{R}^T\by)\non\\ &=&\bV(\bR^T\by-\bR^T\bR(\bb{R}^T\bb{R}+\lambda \bb{I})^{-1}\bb{R}^T\by)\non\\ &=&\bV(\bI-\bR^T\bR(\bb{R}^T\bb{R}+\lambda \bb{I})^{-1})\bR^T\by\non\\ &=&\bV(\bI-(\bR^T\bR+\lambda \bI - \lambda \bI)(\bb{R}^T\bb{R}+\lambda \bb{I})^{-1})\bR^T\by\non\\ &=&\bV(\lambda \bI(\bb{R}^T\bb{R}+\lambda \bb{I})^{-1})\bR^T\by\non\\ &=&\lambda \bV(\bb{R}^T\bb{R}+\lambda \bb{I})^{-1})\bR^T\by\non\\ &=&\lambda \hat\beta.\non \end{eqnarray}\]

Therefore the proof is complete.