Ex. 18.4
Ex. 18.4
Derive the computational formula (18.15) for ridge regression. [Hint: Use the first derivative of the penalized sum-of-squares criterion to show that if \(\lambda > 0\), then \(\hat\beta = \bX^Ts\) for some \(s\in\mathbb{R}^N\).]
Soln. 18.4
By SVD decomposition of \(\bX=\bb{U}\bb{D}\bb{V}^T=\bb{R}\bb{V}^T\), we have \(\bb{V}^T\bb{V}=\bb{I}\).
We need to show that
\[\begin{equation}
\label{eq:18-4a}
\hat\beta = \bb{V}(\bb{R}^T\bb{R}+\lambda \bb{I})^{-1}\bb{R}^T\by.
\end{equation}\]
We know that \(\hat\beta\) solves the equation
\[\begin{equation}
-\bX^T(\by-\bX\hat\beta) + \lambda \hat\beta = 0.\non
\end{equation}\]
Therefore, it suffices to plug \(\eqref{eq:18-4a}\) into the equation above and verify the equation indeed holds. To that end, let's write
\[\begin{eqnarray}
\bX^T(\by-\bX\hat\beta) &=& \bX^T(\by-\bX\bb{V}(\bb{R}^T\bb{R}+\lambda \bb{I})^{-1}\bb{R}^T\by) \non\\
&=& \bb{V}\bb{R}^T(\by - \bR\bV^T\bb{V}(\bb{R}^T\bb{R}+\lambda \bb{I})^{-1}\bb{R}^T\by)\non\\
&=&\bV(\bR^T\by-\bR^T\bR(\bb{R}^T\bb{R}+\lambda \bb{I})^{-1}\bb{R}^T\by)\non\\
&=&\bV(\bI-\bR^T\bR(\bb{R}^T\bb{R}+\lambda \bb{I})^{-1})\bR^T\by\non\\
&=&\bV(\bI-(\bR^T\bR+\lambda \bI - \lambda \bI)(\bb{R}^T\bb{R}+\lambda \bb{I})^{-1})\bR^T\by\non\\
&=&\bV(\lambda \bI(\bb{R}^T\bb{R}+\lambda \bb{I})^{-1})\bR^T\by\non\\
&=&\lambda \bV(\bb{R}^T\bb{R}+\lambda \bb{I})^{-1})\bR^T\by\non\\
&=&\lambda \hat\beta.\non
\end{eqnarray}\]
Therefore the proof is complete.