Ex. 15.7

Suppose we fit a linear regression model to \(N\) observations with response \(y_i\) and predictors \(x_{i1}, ..., x_{ip}\). Assume that all variables are standardized to have mean zero and standard deviation one. Let \(RSS\) be the mean-squared residual on the training data, and \(\hat\beta\) the estimated coefficient. Denote by \(RSS_j^\ast\) the mean-squared residual on the training data using the same \(\hat\beta\), but with the \(N\) values for the \(j\)th variable randomly permuted before the predictions are calculated. Show that

\[\begin{equation} E_P[RSS_j^\ast-RSS] = 2\hat\beta_j^2,\non \end{equation}\]

where \(E_P\) denotes expectation with respect to the permutation distribution. Argue that this is approximately true when the evaluations are done using an independent test set.

Soln. 15.7

Denote by \(\bb{X}_j\) the values of \(X\) but with \(j\)-th variable randomly permuted. Note that \(X_j\) is random. We have

\[\begin{eqnarray} \text{RSS} &=& (\bb{Y}-\bb{X}\hat\beta)^T(\bb{Y}-\bb{X}\hat\beta)\non\\ \text{RSS}_j^\ast &=& (\bb{Y}-\bb{X}_j\hat\beta)^T(\bb{Y}-\bb{X}_j\hat\beta).\non \end{eqnarray}\]

Therefore,

\[\begin{eqnarray} \text{RSS}_j^\ast - \text{RSS} = 2\bb{Y}^T(\bb{X}-\bb{X}_j)\hat\beta + \hat\beta^T(\bb{X}_j^T\bb{X}_j-\bb{X}^T\bb{X})\hat\beta.\non \end{eqnarray}\]

Note that \(\bb{X}_j\) has the same elements as \(\bb{X}\) except in their \(j\)-th column, thus we can rewrite

\[\begin{equation} 2\bb{Y}^T(\bb{X}-\bb{X}_j)\hat\beta=2\hat\beta_j\bb{Y}^T(x_j-x^\ast_j),\non \end{equation}\]

where \(x_j^\ast\) and \(x_j\) represent the \(j\)-th column in \(\bb{X}_j\) and \(\bb{X}\), respectively. That is, \(x_j^\ast\) is a permutation of \(x_j\).

We need to assume that \(\bb{X}^T\bb{X}=\bb{I}.\) It's easy to see that \(E_p[x_j^\ast] = \bar x_j=\bb{0}\), which is a zero-vector. Also, by definition of \(\hat\beta=(\bb{X}^T\bb{X})^{-1}\bb{X}^T\bb{Y}=\bb{X}^T\bb{Y}\), we have

\[\begin{eqnarray} E_P[2\hat\beta_j\bb{Y}^T(x_j-x_j^\ast)] &=& 2\hat\beta_j \bb{Y}^Tx_j\non\\ &=&2\hat\beta_j \cdot \hat\beta_j\non\\ &=&2\hat\beta_j^2.\non \end{eqnarray}\]

On the other hand, it is easy to verify that

\[\begin{equation} E_p[\bb{X}_j^T\bb{X}_j-\bb{X}^T\bb{X}]=0\non \end{equation}\]

under the assumption \(\bb{X}^T\bb{X}=\bb{I}\). The proof is complete.