Ex. 3.16
Ex. 3.16
Derive the entries in Table 3.4, the explicit forms for estimators in the orthogonal case.
Soln. 3.16
In the orthogonal case, we have \(\bX^T\bX=\bI\).
(1) For best-subset, note that the QR decomposition of \(\bX\) can be written as \(\bX=\bX\bI\) since \(\bX\) is orthogonal. Also, \(\hat\beta=(\bX^T\bX)^{-1}\bX\by=\bX\by\). Then, from Ex. 3.9, we know that at each step \(q\) we need to choose \(k\) such that
Therefore we verify the formula \(\hat\beta_j \cdot \bb{1}(|\hat\beta_j| \ge |\hat\beta_{(M)}|)\).
(2) For ridge regression, we know
which is the desired formula.
(3) For lasso, we are solving
When \(\bX\) is orthogonal, recall that \(\hat\beta = \bX^T\by\), we have
Therefore it suffices to solve the minimization for each \(i\) individually. Consider two cases:
a) When \(\hat\beta_j \ge 0\), for optimal solution \(\beta^\ast\), we need \(\beta^\ast_j\ge 0\). Consider if \(\beta^\ast_j < 0\), we could choose \(\beta^{\text{new}} = -\beta^\ast -\epsilon\) for some \(\epsilon>0\), so that \(\beta^\ast\) is strictly less optimal than \(\beta^{\text{new}}\) (which contradicts the optimality of \(\beta^\ast\).)
Therefore, we need to solve
which is quadratic in \(\beta_j\) and its solution is easily seen to be \((\hat\beta_j-\gamma)_+\).
b) When \(\hat\beta_j < 0\), similarly, we need \(\beta_j < 0\). Thus, we need to solve
which has the solution \((\hat\beta_j + \gamma)_-\).
In both cases, the solution can be written as \(\text{sign}(\hat\beta_j)(|\hat\beta_j|-\lambda)_+\).