Ex. 7.7
Ex. 7.7
Use the approximation \(1/(1-x)^2 \approx 1 + 2x\) to expose the relationship between \(C_p/\text{AIC}\) (7.26) and GCV (7.52), the main difference being the model used to estimate the noise variance \(\sigma_\epsilon^2\).
Soln. 7.7
By (7.52) in the text, we have
\[\begin{eqnarray}
\text{GCV}(\hat f) &=& \frac{1}{N}\sum_{i=1}^N\left(\frac{y_i-\hat f(x_i)}{1-\text{trace}(\bb{S})/N}\right)^2\non\\
&\approx& \frac{1}{N}\sum_{i=1}^N(y_i-\hat f(x_i))^2\left(1 + \frac{2\text{trace}(\bb{S})}{N}\right)\non\\
&=&\overline{\text{err}} + \frac{2\text{trace}(\bb{S})}{N^2}\sum_{i=1}^N(y_i-\hat f(x_i))^2\non\\
&\approx&\overline{\text{err}} + \frac{2\text{trace}(\bb{S})}{N}\hat\sigma^2_\epsilon.\non
\end{eqnarray}\]
For \(C_p/\text{AIC}\), by (7.26) in the text, we have
\[\begin{equation}
C_p = \overline{\text{err}} + 2 \cdot \frac{d}{N}\hat\sigma^2_\epsilon.\non
\end{equation}\]
Recall that \(\text{trace}(\bb{S})\) is the effective degree-of-freedom \(d\), therefore \(C_p/\text{AIC}\) and GCV have almost the same expression and the main difference is how to estimate the noise variance \(\sigma^2_\epsilon\).