Ex. 18.5

Prove the theorem (18.16)–(18.17) in Section 18.3.5, by decomposing \(\beta\) and the rows of \(\bX\) into their projections into the column space of \(\bb{V}\) and its complement in \(\mathbb{R}^p\).

Soln. 18.5

Intuitively the proof follows by change of variates \(\beta = \bb{V}\theta\). Here we recap the proof detailed in \cite{hastie2004efficient}.

Let \(\bV_\bot\) be the \(p\times (p-n)\) and span the complementary subspace in \(\mathbb{R}^p\) to \(\bV\). Then \(\bb{Q} = (\bV : \bV_\bot)\) is a \(p\times p\) orthogonal matrix. Let \(x_i^\ast = \bb{Q}^Tx_i\) and \(\beta^\ast = \bb{Q}^T\beta\). Then

\[\begin{eqnarray} &&(x_i^\ast)^T\beta^\ast = x_i^T\bb{Q}\bb{Q}^T\beta = x_i^T\beta,\non\\ &&(\beta^\ast)^T\beta^\ast = \beta^T\bb{Q}\bb{Q}^T\beta=\beta^T\beta.\non \end{eqnarray}\]

Hence (18.16) is invariant under orthogonal transformation. There is a one-to-one mapping between the locations of their minima, so we can focus on \(\beta^\ast\) rather than \(\beta\). By definition of \(\bV\) in \(\bX = \bR\bV^T\), we know \((x_i^\ast)^T\beta^\ast=r_i^T\beta_1^\ast\) where \(\beta_1^\ast\) consists of the first \(n\) elements of \(\beta^\ast\). Hence the loss part of (18.16) involves \(\beta_0\) and \(\beta_1^\ast\). We can similarly factor the quadratic penalty into two terms \(\lambda(\beta_1^\ast)^T\beta_1^\ast + \lambda(\beta_2^\ast)^T\beta_2^\ast\), and rewrite (18.16) as

\[\begin{equation} \left[\sum_{i=1}^NL(y_i, \beta_0 + r_i^T\beta_1^\ast)+\lambda (\beta_1^\ast)^T\beta_1^\ast\right] + [\lambda (\beta_2^\ast)^T\beta_2^\ast],\non \end{equation}\]

which we can minimize separately. The second part is minimized at \(\beta_2^\ast=0\), and the result follows by noting that the first part is identical to (18.17) with \(\theta_0=\beta_0\) and \(\theta = \beta_1^\ast\). From the equivalence,

\[\begin{equation} \hat\beta = \bb{Q}\hat{\beta^\ast} = \begin{pmatrix} \bV & \bV_\bot \end{pmatrix}\begin{pmatrix} \hat\theta\\ 0 \end{pmatrix}=\bV\hat\theta.\non \end{equation}\]