Ex. 18.5
Ex. 18.5
Prove the theorem (18.16)–(18.17) in Section 18.3.5, by decomposing \(\beta\) and the rows of \(\bX\) into their projections into the column space of \(\bb{V}\) and its complement in \(\mathbb{R}^p\).
Soln. 18.5
Intuitively the proof follows by change of variates \(\beta = \bb{V}\theta\). Here we recap the proof detailed in \cite{hastie2004efficient}.
Let \(\bV_\bot\) be the \(p\times (p-n)\) and span the complementary subspace in \(\mathbb{R}^p\) to \(\bV\). Then \(\bb{Q} = (\bV : \bV_\bot)\) is a \(p\times p\) orthogonal matrix. Let \(x_i^\ast = \bb{Q}^Tx_i\) and \(\beta^\ast = \bb{Q}^T\beta\). Then
Hence (18.16) is invariant under orthogonal transformation. There is a one-to-one mapping between the locations of their minima, so we can focus on \(\beta^\ast\) rather than \(\beta\). By definition of \(\bV\) in \(\bX = \bR\bV^T\), we know \((x_i^\ast)^T\beta^\ast=r_i^T\beta_1^\ast\) where \(\beta_1^\ast\) consists of the first \(n\) elements of \(\beta^\ast\). Hence the loss part of (18.16) involves \(\beta_0\) and \(\beta_1^\ast\). We can similarly factor the quadratic penalty into two terms \(\lambda(\beta_1^\ast)^T\beta_1^\ast + \lambda(\beta_2^\ast)^T\beta_2^\ast\), and rewrite (18.16) as
which we can minimize separately. The second part is minimized at \(\beta_2^\ast=0\), and the result follows by noting that the first part is identical to (18.17) with \(\theta_0=\beta_0\) and \(\theta = \beta_1^\ast\). From the equivalence,