Ex. 15.7
Ex. 15.7
Suppose we fit a linear regression model to \(N\) observations with response \(y_i\) and predictors \(x_{i1}, ..., x_{ip}\). Assume that all variables are standardized to have mean zero and standard deviation one. Let \(RSS\) be the mean-squared residual on the training data, and \(\hat\beta\) the estimated coefficient. Denote by \(RSS_j^\ast\) the mean-squared residual on the training data using the same \(\hat\beta\), but with the \(N\) values for the \(j\)th variable randomly permuted before the predictions are calculated. Show that
where \(E_P\) denotes expectation with respect to the permutation distribution. Argue that this is approximately true when the evaluations are done using an independent test set.
Soln. 15.7
Denote by \(\bb{X}_j\) the values of \(X\) but with \(j\)-th variable randomly permuted. Note that \(X_j\) is random. We have
Therefore,
Note that \(\bb{X}_j\) has the same elements as \(\bb{X}\) except in their \(j\)-th column, thus we can rewrite
where \(x_j^\ast\) and \(x_j\) represent the \(j\)-th column in \(\bb{X}_j\) and \(\bb{X}\), respectively. That is, \(x_j^\ast\) is a permutation of \(x_j\).
We need to assume that \(\bb{X}^T\bb{X}=\bb{I}.\) It's easy to see that \(E_p[x_j^\ast] = \bar x_j=\bb{0}\), which is a zero-vector. Also, by definition of \(\hat\beta=(\bb{X}^T\bb{X})^{-1}\bb{X}^T\bb{Y}=\bb{X}^T\bb{Y}\), we have
On the other hand, it is easy to verify that
under the assumption \(\bb{X}^T\bb{X}=\bb{I}\). The proof is complete.