Ex. 6.10

Suppose we have \(N\) samples generated from the model \(y_i=f(x_i)+\epsilon_i\), with \(\epsilon_i\) independent and identically distributed with mean zero and variance \(\sigma^2\), the \(x_i\) assumed fixed (non random). We estimate \(f\) using a linear smoother (local regression, smoothing spline, etc.) with smoothing parameter \(\lambda\). Thus the vector of fitted value is given by \(\boldsymbol{\hat f} = \bm{S}_\lambda \bb{y}\). Consider the in-sample prediction error

\[\begin{equation} \text{PE}(\lambda) = E \frac{1}{N} \sum_{i=1}^N(y_i^\ast-\hat f_\lambda(x_i))^2\non \end{equation}\]

for predicting new responses at the \(N\) input values. Show that the average squared residual on the training data, ASR(\(\lambda\)), is a biased estimate (optimistic) for PE(\(\lambda\)), while

\[\begin{equation} C_\lambda = \text{ASR}(\lambda) + \frac{2\sigma^2}{N}\text{trace}(\bb{S}_\lambda)\non \end{equation}\]

is unbiased.

Soln. 6.10

The proof follows directly from Ex. 7.4 and Ex. 7.5 for general linear smoother. Specifically, by Ex. 7.4, we know

\[\begin{equation} \text{PE}(\lambda) = \text{ASR}(\lambda) + \frac{2}{N}\sum_{i=1}^N\text{Cov}(\hat y_i, y_i)\non \end{equation}\]

and from Ex. 7.5 we have

\[\begin{equation} \sum_{i=1}^N\text{Cov}(\hat y_i, y_i) = \text{trace}(\bb{S}_\lambda)\sigma^2.\non \end{equation}\]

Then the proof is straightforward.