Ex. 3.6

Show that the ridge regression estimate is the mean (and mode) of the posterior distribution, under a Gaussian prior \(\beta\sim N(0,\tau\textbf{I})\), and Gaussian sampling model \(\textbf{y}\sim N(\textbf{X}\beta, \sigma^2\textbf{I})\). Find the relationship between the regularization parameter \(\lambda\) in the ridge formula, and the variances \(\tau\) and \(\sigma^2\).

Soln. 3.6

By Bayes' theorem we have

\[\begin{eqnarray} P(\beta|\textbf{y}) &=& \frac{P(\textbf{y}|\beta)P(\beta)}{P(\textbf{y})}\nonumber\\ &=&\frac{ N(\textbf{X}\beta, \sigma^2\textbf{I})N(0, \tau\textbf{I}) }{P(\textbf{y})}.\nonumber \end{eqnarray}\]

Taking logarithm on both sides we get

\[\begin{eqnarray} \ln(P(\beta|\textbf{y})) = -\frac{1}{2}\left(\frac{(\textbf{y}-\textbf{X}\beta)^T(\textbf{y}-\textbf{X}\beta)}{\sigma^2} + \frac{\beta^T\beta}{\tau}\right) + C,\nonumber \end{eqnarray}\]

where \(C\) is a constant independent of \(\beta\). Therefore, if we let \(\lambda = \frac{\sigma^2}{\tau}\), then maximizing over \(\beta\) for \(P(\beta|\textbf{y})\) is equivalent to

\[\begin{equation} \min_{\beta} (\textbf{y}-\textbf{X}\beta)^T(\textbf{y}-\textbf{X}\beta) + \lambda \beta^T\beta,\nonumber \end{equation}\]

which is essentially solving a ridge regression.