Ex. 3.6
Ex. 3.6
Show that the ridge regression estimate is the mean (and mode) of the posterior distribution, under a Gaussian prior \(\beta\sim N(0,\tau\textbf{I})\), and Gaussian sampling model \(\textbf{y}\sim N(\textbf{X}\beta, \sigma^2\textbf{I})\). Find the relationship between the regularization parameter \(\lambda\) in the ridge formula, and the variances \(\tau\) and \(\sigma^2\).
Soln. 3.6
By Bayes' theorem we have
\[\begin{eqnarray}
P(\beta|\textbf{y}) &=& \frac{P(\textbf{y}|\beta)P(\beta)}{P(\textbf{y})}\nonumber\\
&=&\frac{ N(\textbf{X}\beta, \sigma^2\textbf{I})N(0, \tau\textbf{I}) }{P(\textbf{y})}.\nonumber
\end{eqnarray}\]
Taking logarithm on both sides we get
\[\begin{eqnarray}
\ln(P(\beta|\textbf{y})) = -\frac{1}{2}\left(\frac{(\textbf{y}-\textbf{X}\beta)^T(\textbf{y}-\textbf{X}\beta)}{\sigma^2} + \frac{\beta^T\beta}{\tau}\right) + C,\nonumber
\end{eqnarray}\]
where \(C\) is a constant independent of \(\beta\). Therefore, if we let \(\lambda = \frac{\sigma^2}{\tau}\), then maximizing over \(\beta\) for \(P(\beta|\textbf{y})\) is equivalent to
\[\begin{equation}
\min_{\beta} (\textbf{y}-\textbf{X}\beta)^T(\textbf{y}-\textbf{X}\beta) + \lambda \beta^T\beta,\nonumber
\end{equation}\]
which is essentially solving a ridge regression.