Ex. 3.23
Ex. 3.23
Consider a regression problem with all variables and response having mean zero and standard deviation one. Suppose also that each variable has identical absolute correlation with the response:
Let \(\hat \beta\) be the least-squares coefficient of \(\bb{y}\) on \(\bX\), and let \(\bb{u}(\alpha) = \alpha\bX\hat\beta\) for \(\alpha\in [0,1]\) be the vector that moves a fraction \(\alpha\) toward the least squares fit \(\bb{u}\). Let \(RSS\) be the residual sum-of-squares from the full least squares fit.
- 
(a) Show that \[\begin{equation} \frac{1}{N}|\langle \bx_j, \by-\bb{u}(\alpha)\rangle| = (1-\alpha)\lambda, \ j=1,...,p,\non \end{equation}\]and hence the correlations of each \(\bx_j\) with the residuals remain equal in magnitude as we progress toward \(\bb{u}\). 
- 
(b) Show that these correlations are all equal to \[\begin{equation} \lambda(\alpha) = \frac{(1-\alpha)}{\sqrt{(1-\alpha)^2 + \frac{\alpha(2-\alpha)}{N}\cdot RSS}} \cdot \lambda,\non \end{equation}\]and hence they decrease monotonically to zero. 
- 
(c) Use these results to show that the LAR algorithm in Section 3.4.4 keeps the correlations tied and monotonically decreasing, as claimed in (3.55). 
Soln. 3.23
- 
(a) By definition we have \[\begin{eqnarray} \frac{1}{N}|\langle \bX, \by-\bb{u}(\alpha)\rangle| &=& \frac{1}{N}|\langle \bX, \by - \alpha\bX\langle\bX, \bX\rangle^{-1}\langle\bX, \by\rangle|\non\\ &=&\frac{1}{N}|\langle\bX, \by\rangle - \alpha \langle\bX, \by\rangle|\non\\ &=&(1-\alpha) \frac{1}{N}|\langle\bX, \by\rangle|.\non \end{eqnarray}\]Since \(\frac{1}{N}|\langle \bb{x}_j, \bb{y} \rangle| = \lambda\), we have \[\begin{equation} \frac{1}{N}|\langle \bx_j, \by-\bb{u}(\alpha)\rangle| = (1-\alpha)\lambda, \ j=1,...,p.\non \end{equation}\]
- 
(b) From (a), the correlations are \[\begin{equation} \frac{(1-\alpha)\lambda}{\sqrt{\frac{\langle \bx_j, \bx_j\rangle}{N}}\sqrt{\frac{\langle \by-\bb{u}(\alpha),\by-\bb{u}(\alpha)\rangle }{N}}}=\frac{(1-\alpha)\lambda}{\sqrt{\frac{\langle \by-\bb{u}(\alpha),\by-\bb{u}(\alpha)\rangle }{N}}}.\non \end{equation}\]We need to calculate \(\langle \by-\bb{u}(\alpha),\by-\bb{u}(\alpha)\rangle\). By definition of \(\bb{u}(\alpha)\), we have \[\begin{eqnarray} \langle \by-\bb{u}(\alpha),\by-\bb{u}(\alpha)\rangle &=& \langle\by,\by\rangle + \alpha^2\langle \bX, \bX\rangle^{-1}\langle\bX, \by\rl^2-2\alpha\langle\bX, \bX\rl^{-1} \langle\bX, \by\rl^2\non\\ &=&\langle\by,\by\rangle + (\alpha^2-2\alpha)\langle\bX, \bX\rl^{-1} \langle\bX, \by\rl^2.\non \end{eqnarray}\]On the other hand, we have \[\begin{equation} \text{RSS} = \langle \by, \by\rl - \langle\bX,\bX\rl^{-1}\langle\bX, \by\rl^2.\non \end{equation}\]So we have \[\begin{eqnarray} \frac{1}{N}\langle \by-\bb{u}(\alpha),\by-\bb{u}(\alpha)\rangle &=& \frac{1}{N}\langle\by,\by\rangle + \frac{1}{N}(\alpha^2-2\alpha)\left(\langle\by,\by\rangle-\text{RSS}\right)\non\\ &=&(\alpha-1)^2\frac{1}{N}\langle\by,\by\rangle + \frac{(2\alpha-\alpha^2)}{N}\text{RSS}\non\\ &=&(\alpha-1)^2+\frac{\alpha(2-\alpha)}{N}\text{RSS}.\non \end{eqnarray}\]The proof is now complete. 
- 
(c) When \(\alpha=0\) we have \(\lambda(0)=\lambda\); when \(\alpha=1\) we have \(\lambda(1)=0\), where all correlations are tied and decrease from \(\lambda\) to 0 as \(\alpha\) moves from 0 to 1.