Ex. 7.3

Ex. 7.3

Let \(\boldsymbol{\hat f} = \bb{S}\bb{y}\) be a linear smoothing of \(\bb{y}\).

(a)

If \(S_{ii}\) is the \(i\)th diagonal element of \(\bb{S}\), show that for \(\bb{S}\) arising from least squares projections and cubic smoothing splines, the cross-validated residual can be written as

\[\begin{equation} \label{eq:ex73-a} y_i - \hat f^{-i}(x_i) = \frac{y_i - \hat f(x_i)}{1-S_{ii}}. \end{equation}\]
(b)

Use this result to show that \(|y_i-\hat f^{-i}(x_i)| \ge |y_i - \hat f(x_i)|\).

(c)

Find general conditions on any smoother \(\bb{S}\) to make result \(\eqref{eq:ex73-a}\) hold.

Soln. 7.3

Without loss of generality, we assume

\[\begin{equation} \bb{S} = \bb{X}(\bb{X}^T\bb{X} + \lambda \bm{\Omega})^{-1}\bb{X}^T.\non \end{equation}\]

For least squares we have \(\lambda=0\), and for cubic smoothing we have \(\lambda\ge 0\). See Chapters 3 & 5 in the text for more details.

(a)

We have

\[\begin{eqnarray} S_{ii} &=& x_i^T(\bb{X}^T\bb{X} + \lambda \bm{\Omega})^{-1}x_i,\non\\ \hat f(x_i) &=& x_i^T(\bb{X}^T\bb{X} + \lambda \bm{\Omega})^{-1}\bb{X}^T\bb{y}.\non \end{eqnarray}\]

Let \(\bb{X}_{-i}\) and \(\bb{y}_{-i}\) be the corresponding results with \(x_i\) removed, then we have

\[\begin{eqnarray} \hat f^{-i}(x_i) &=& x_i^T(\bb{X}_{-i}^T\bb{X}_{-i} + \lambda \bm{\Omega})^{-1}\bb{X}_{-i}^T\bb{y}_{-i}\non\\ &=&x_i^T(\bb{X}^T\bb{X} - x_ix_i^T + \lambda \bm{\Omega})^{-1}(\bb{X}^T\bb{y}-x_iy_i).\label{eq:73-1} \end{eqnarray}\]

Let \(\bb{A} = (\bb{X}^T\bb{X} + \lambda\Omega)\), by Woodbury matrix identity, we have

\[\begin{eqnarray} (\bb{A}-x_ix_i^T)^{-1} &=& \bb{A}^{-1} + \frac{\bb{A}^{-1}x_ix_i^T\bb{A}^{-1}}{1-x_i^T\bb{A}^{-1}x_i}.\non \end{eqnarray}\]

Therefore, \(\eqref{eq:73-1}\) becomes

\[\begin{eqnarray} \hat f^{-1}(x_i) &=& x_i^T\left(\bb{A}^{-1} + \frac{\bb{A}^{-1}x_ix_i^T\bb{A}^{-1}}{1-x_i^T\bb{A}^{-1}x_i}\right)(\bb{X}^T\bb{y}-x_iy_i)\non\\ &=&\left(x_i^T\bb{A}^{-1} + \frac{S_{ii}x_i^T\bb{A}^{-1}}{1-S_{ii}}\right)(\bb{X}^T\bb{y}-x_iy_i)\non\\ &=&x_i^T\bb{A}^{-1}\bb{X}^T\bb{y} - x_i^T\bb{A}^{-1}x_iy_i + \frac{S_{ii}x_i^T\bb{A}^{-1}\bb{X}^T\bb{y}}{1-S_{ii}} - \frac{S_{ii}x_i^T\bb{A}^{-1}x_iy_i}{1-S_{ii}}\non\\ &=&\hat f(x_i) -y_iS_{ii} + \frac{S_{ii}\hat f(x_i)}{1-S_{ii}} - \frac{y_iS_{ii}^2}{1-S_{ii}}\non\\ &=&\frac{\hat f(x_i) - y_i S_{ii}}{1-S_{ii}}.\non \end{eqnarray}\]

Therefore by simple algebra we have \(\eqref{eq:ex73-a}\).

(b)

Note that \(\bb{S} = \bb{X}(\bb{X}^T\bb{X} + \lambda \bm{\Omega})^{-1}\bb{X}^T\) is positive-semidefinite and has eigen-decomposition

\[\begin{equation} \bb{S} = \sum_{k=1}^N\rho_k(\lambda)\bb{u}_k\bb{u}_k^T.\non \end{equation}\]

See Section 5.4.1 in the text for more details. Therefore, we know that \(\bb{S}\bb{S} \preceq \bb{S}\), so that

\[\begin{equation} 0\le (S^2)_{ii} = \sum_{k\neq i}S_{ik}^2 + S_{ii}^2\le S_{ii}\non \end{equation}\]

from which we know \(0\le S_{ii} \le 1\).

By \(\eqref{eq:ex73-a}\) we have \(|y_i-\hat f^{-i}(x_i)| \ge |y_i - \hat f(x_i)|\).

(c)

For general linear smoother \(\boldsymbol{\hat f} = \bb{S}\bb{y}\), if \(\bb{S}\) only depends on \(\bb{X}\) and other tuning parameters (i.e., independent of \(\bb{y}\)), \(\eqref{eq:ex73-a}\) still holds.

To see that, note that if we replace \(y_i\) with \(\hat f^{-i}(x_i)\) (obtained by \(\eqref{eq:73-1}\)) in \(\textbf{y}\) and denote the new vector by \(\textbf{y}'\), \(\textbf{S}\) is not changed. Thus we have

\[\begin{eqnarray} \hat f^{-i}(x_i) &=&(\textbf{S}\textbf{y}')_i\non\\ &=& \sum_{i\neq j}S_{ij}\textbf{y}'_j + S_{ii}\hat f^{-i}(x_i)\non\\ &=&\hat f(x_i) - S_{ii}y_i + S_{ii}\hat f^{-i}(x_i),\non \end{eqnarray}\]

therefore we obtain \(\eqref{eq:ex73-a}\).