Ex. 7.4
Ex. 7.4
Consider the in-sample prediction error (7.18) and the training error \(\overline{\text{err}}\) in the case of squared-error loss:
\[\begin{eqnarray}
\text{Err}_{\text{in}} &=& \frac{1}{N}\sum_{i=1}^NE_{Y^0}(Y_i^0-\hat f(x_i))^2\non\\
\overline{\text{err}} &=& \frac{1}{N}\sum_{i=1}^N(y_i-\hat f(x_i))^2.\non
\end{eqnarray}\]
Add and subtract \(f(x_i)\) and \(E\hat f(x_i)\) in each expression and expand. Hence establish that the average optimism in the training error is
\[\begin{equation}
\frac{2}{N}\sum_{i=1}^N\text{Cov}(\hat y_i, y_i),\non
\end{equation}\]
as given in (7.21).
Soln. 7.4
We start with \(\text{Err}_{\text{in}}\). Let's denote \(\hat y_i = \hat f(x_i)\) and write
\[\begin{equation}
Y_i^0-\hat f(x_i) = Y_i^0-f(x_i) + f(x_i)-E\hat y_i + E\hat y_i -\hat y_i\non
\end{equation}\]
so that
\[\begin{eqnarray}
\text{Err}_{\text{in}} &=& \frac{1}{N}\sum_{i=1}^NE_{Y^0}\left(Y_i^0-f(x_i) + f(x_i)-E\hat y_i + E\hat y_i -\hat y_i\right)^2\non\\
&=&\frac{1}{N}\sum_{i=1}^NA_i + B_i + C_i + D_i + E_i + F_i,\non
\end{eqnarray}\]
where
\[\begin{eqnarray}
A_i &=& E_{Y^0} (Y_i^0-f(x_i))^2\non\\
B_i &=& E_{Y^0} (f(x_i) - E\hat y_i)^2 = (f(x_i) - E\hat y_i)^2\non\\
C_i &=& E_{Y^0} (E\hat y_i-\hat y_i)^2 = (E\hat y_i-\hat y_i)^2\non\\
D_i &=& 2E_{Y^0} (Y_i^0-f(x_i))(f(x_i) - E\hat y_i)\non\\
E_i &=& 2E_{Y^0} (Y_i^0-f(x_i))(E\hat y_i-\hat y_i)\non\\
F_i &=& 2E_{Y^0} (f(x_i) - E\hat y_i)(E\hat y_i-\hat y_i) = 2(f(x_i) - E\hat y_i)(E\hat y_i-\hat y_i)\non
\end{eqnarray}\]
Similarly for \(\overline{\text{err}}\) we have
\[\begin{equation}
y_i-\hat f(x_i) = y_i - f(x_i) +f(x_i) - E\hat y_i + E\hat y_i -\hat y_i\non
\end{equation}\]
and
\[\begin{eqnarray}
\overline{\text{err}} &=& \frac{1}{N}\sum_{i=1}^{N}(y_i - f(x_i) +f(x_i) - E\hat y_i + E\hat y_i -\hat y_i)^2\non\\
&=&\frac{1}{N}\sum_{i=1}^N G_i + B_i + C_i + H_i + J_i + F_i,\non
\end{eqnarray}\]
where
\[\begin{eqnarray}
G_i &=& (y_i-f(x_i))^2\non\\
H_i &=& 2(y_i-f(x_i))(f(x_i) - E\hat y_i)\non\\
J_i &=& 2(y_i-f(x_i))(E\hat y_i -\hat y_i).\non
\end{eqnarray}\]
Therefore, we have
\[\begin{eqnarray}
E_\bb{y}(\text{op}) &=& E_\bb{y}(\text{Err}_{\text{in}} - \overline{\text{err}})\non\\
&=&\frac{1}{N}\sum_{i=1}^NE_\bb{y}[(A_i-G_i) + (D_i-H_i) + (E_i-J_i)].\non
\end{eqnarray}\]
For \(A_i\) and \(G_i\), the expectaion over \(\bb{y}\) captures unpredictable error and thus \(E_\bb{y}(A_i-G_i) = 0\). Similarly we have \(E_\bb{y}D_i = E_\bb{y}H_i = E_\bb{y}E_i =0\), and thus
\[\begin{eqnarray}
E_\bb{y}(\text{op}) &=& - \frac{2}{N}\sum_{i=1}^NJ_i\non\\
&=& - \frac{2}{N}\sum_{i=1}^NE_\bb{y}(y_i-f(x_i))(E\hat y_i -\hat y_i)\non\\
&=&\frac{2}{N}\sum_{i=1}^N [E_\bb{y}(y_i\hat y_i) - E_\bb{y}y_iE_\bb{y}\hat y_i]\non\\
&=&2\text{Cov}(y_i, \hat y_i).\non
\end{eqnarray}\]