Ex. 3.23

Consider a regression problem with all variables and response having mean zero and standard deviation one. Suppose also that each variable has identical absolute correlation with the response:

\[\begin{equation} \frac{1}{N}|\langle \bb{x}_j, \bb{y} \rangle| = \lambda, \ j=1,...,p.\non \end{equation}\]

Let \(\hat \beta\) be the least-squares coefficient of \(\bb{y}\) on \(\bX\), and let \(\bb{u}(\alpha) = \alpha\bX\hat\beta\) for \(\alpha\in [0,1]\) be the vector that moves a fraction \(\alpha\) toward the least squares fit \(\bb{u}\). Let \(RSS\) be the residual sum-of-squares from the full least squares fit.

(a) Show that

\[\begin{equation} \frac{1}{N}|\langle \bx_j, \by-\bb{u}(\alpha)\rangle| = (1-\alpha)\lambda, \ j=1,...,p,\non \end{equation}\]

and hence the correlations of each \(\bx_j\) with the residuals remain equal in magnitude as we progress toward \(\bb{u}\).
(b) Show that these correlations are all equal to

\[\begin{equation} \lambda(\alpha) = \frac{(1-\alpha)}{\sqrt{(1-\alpha)^2 + \frac{\alpha(2-\alpha)}{N}\cdot RSS}} \cdot \lambda,\non \end{equation}\]

and hence they decrease monotonically to zero.
(c) Use these results to show that the LAR algorithm in Section 3.4.4 keeps the correlations tied and monotonically decreasing, as claimed in (3.55).

Soln. 3.23

(a) By definition we have

\[\begin{eqnarray} \frac{1}{N}|\langle \bX, \by-\bb{u}(\alpha)\rangle| &=& \frac{1}{N}|\langle \bX, \by - \alpha\bX\langle\bX, \bX\rangle^{-1}\langle\bX, \by\rangle|\non\\ &=&\frac{1}{N}|\langle\bX, \by\rangle - \alpha \langle\bX, \by\rangle|\non\\ &=&(1-\alpha) \frac{1}{N}|\langle\bX, \by\rangle|.\non \end{eqnarray}\]

Since \(\frac{1}{N}|\langle \bb{x}_j, \bb{y} \rangle| = \lambda\), we have

\[\begin{equation} \frac{1}{N}|\langle \bx_j, \by-\bb{u}(\alpha)\rangle| = (1-\alpha)\lambda, \ j=1,...,p.\non \end{equation}\]
(b) From (a), the correlations are

\[\begin{equation} \frac{(1-\alpha)\lambda}{\sqrt{\frac{\langle \bx_j, \bx_j\rangle}{N}}\sqrt{\frac{\langle \by-\bb{u}(\alpha),\by-\bb{u}(\alpha)\rangle }{N}}}=\frac{(1-\alpha)\lambda}{\sqrt{\frac{\langle \by-\bb{u}(\alpha),\by-\bb{u}(\alpha)\rangle }{N}}}.\non \end{equation}\]

We need to calculate \(\langle \by-\bb{u}(\alpha),\by-\bb{u}(\alpha)\rangle\). By definition of \(\bb{u}(\alpha)\), we have

\[\begin{eqnarray} \langle \by-\bb{u}(\alpha),\by-\bb{u}(\alpha)\rangle &=& \langle\by,\by\rangle + \alpha^2\langle \bX, \bX\rangle^{-1}\langle\bX, \by\rl^2-2\alpha\langle\bX, \bX\rl^{-1} \langle\bX, \by\rl^2\non\\ &=&\langle\by,\by\rangle + (\alpha^2-2\alpha)\langle\bX, \bX\rl^{-1} \langle\bX, \by\rl^2.\non \end{eqnarray}\]

On the other hand, we have

\[\begin{equation} \text{RSS} = \langle \by, \by\rl - \langle\bX,\bX\rl^{-1}\langle\bX, \by\rl^2.\non \end{equation}\]

So we have

\[\begin{eqnarray} \frac{1}{N}\langle \by-\bb{u}(\alpha),\by-\bb{u}(\alpha)\rangle &=& \frac{1}{N}\langle\by,\by\rangle + \frac{1}{N}(\alpha^2-2\alpha)\left(\langle\by,\by\rangle-\text{RSS}\right)\non\\ &=&(\alpha-1)^2\frac{1}{N}\langle\by,\by\rangle + \frac{(2\alpha-\alpha^2)}{N}\text{RSS}\non\\ &=&(\alpha-1)^2+\frac{\alpha(2-\alpha)}{N}\text{RSS}.\non \end{eqnarray}\]

The proof is now complete.
(c) When \(\alpha=0\) we have \(\lambda(0)=\lambda\); when \(\alpha=1\) we have \(\lambda(1)=0\), where all correlations are tied and decrease from \(\lambda\) to 0 as \(\alpha\) moves from 0 to 1.