Ex. 7.7

Use the approximation \(1/(1-x)^2 \approx 1 + 2x\) to expose the relationship between \(C_p/\text{AIC}\) (7.26) and GCV (7.52), the main difference being the model used to estimate the noise variance \(\sigma_\epsilon^2\).

Soln. 7.7

By (7.52) in the text, we have

\[\begin{eqnarray} \text{GCV}(\hat f) &=& \frac{1}{N}\sum_{i=1}^N\left(\frac{y_i-\hat f(x_i)}{1-\text{trace}(\bb{S})/N}\right)^2\non\\ &\approx& \frac{1}{N}\sum_{i=1}^N(y_i-\hat f(x_i))^2\left(1 + \frac{2\text{trace}(\bb{S})}{N}\right)\non\\ &=&\overline{\text{err}} + \frac{2\text{trace}(\bb{S})}{N^2}\sum_{i=1}^N(y_i-\hat f(x_i))^2\non\\ &\approx&\overline{\text{err}} + \frac{2\text{trace}(\bb{S})}{N}\hat\sigma^2_\epsilon.\non \end{eqnarray}\]

For \(C_p/\text{AIC}\), by (7.26) in the text, we have

\[\begin{equation} C_p = \overline{\text{err}} + 2 \cdot \frac{d}{N}\hat\sigma^2_\epsilon.\non \end{equation}\]

Recall that \(\text{trace}(\bb{S})\) is the effective degree-of-freedom \(d\), therefore \(C_p/\text{AIC}\) and GCV have almost the same expression and the main difference is how to estimate the noise variance \(\sigma^2_\epsilon\).