Ex. 3.9
Ex. 3.9
Forward stepwise regression. Suppose we have the QR decomposition for the \(N\times q\) matrix \(\textbf{X}_1\) in a multiple regression problem with response \(\textbf{y}\), and we have an additional \(p-q\) predictors in the matrix \(\textbf{X}_2\). Denote the current residual by \(\textbf{r}\). We wish to establish which one of these additional variables will reduce the residual sum-of-squares the most when included with those in \(\textbf{X}_1\). Describe an efficient procedure for doing this.
Soln. 3.9
Without loss of generality we assume \(\textbf{X}_1=(\textbf{x}_1, ...,\textbf{x}_q)\in\mathbb{R}^{N\times q}\). Write the QR decomposition for \(\textbf{X}_1\) as
where \(\textbf{Q}_1=(\textbf{q}_1, ..., \textbf{q}_q)\) is a \(N\times q\) orthogonal matrix and \(\textbf{R}_1\) is a \(q\times a\) upper triangular matrix. We know that \(\textbf{Q}_1\) spans the column space of \(\textbf{X}_1\).
Now we consider choosing an additional predictor \(\textbf{x}_k\), where \(q < k\le p\). The projection of \(\textbf{x}_k\) onto the column space of \(\textbf{X}_1\), denoted by \(\mathcal{P}_{\textbf{X}_1}(\textbf{x}_k)\), is
Define
If we add \(\textbf{x}_k\) into \(\textbf{X}_1\), the newly fitted value, which is the projection of \(\textbf{y}\) onto newly spanned space by \(\textbf{X}_1\) and \(\textbf{x}_k\), becomes
and thus the residual sum-of-squares is reduced by
Therefore, at each step \(q\) we are going to choose \(k\) such that
Note the calculations above require the values of \(\textbf{q}_1, ..., \textbf{q}_q\), which are already available from \(\textbf{Q}_1\) in QR decomposition. After we get \(\textbf{q}_k\), we rename it to \(\textbf{q}_{q+1}\), and then we can augment \(\textbf{Q}_1\) to \(\tilde{\textbf{Q}}_1=(\textbf{q}_1, ..., \textbf{q}_q, \textbf{q}_{q+1})\) and continue to next iteration \(q+1\).
Remark
The value of \(\textbf{R}_1\) here is less relevant.