Soln. 6.3
Let's first introduce notations. Define the vector-valued function \(b(x)^T = (1,x,x^2,...,x^d)\) for \(d\ge 1\). Let \(\bb{B}\) be the \(N\times (d+1)\) regression matrix with \(i\)th row \(b(x_i)^T\),
\[\begin{equation}
\label{eq:6-3B}
\bb{B} = \begin{pmatrix}
1 & x_{1} & \cdots & x^d_1 \\
1 & x_{2} & \cdots & x_{2}^d \\
\vdots & \vdots & \ddots & \vdots \\
1 & x_{N} & \cdots & x_{N}^d
\end{pmatrix} =
\begin{pmatrix}
b(x_1)^T\\
b(x_2)^T\\
\vdots\\
b(x_N)^T
\end{pmatrix}\in \mathbb{R}^{N\times (d+1)}\non
\end{equation}\]
and
\[\begin{equation}
\bb{B}^T = \begin{pmatrix}
1 & 1 & \cdots & 1 \\
x_{1} & x_{2} & \cdots & x_{N} \\
\vdots & \vdots & \ddots & \vdots \\
x_{1}^d & x_{2}^d & \cdots & x_{N}^d
\end{pmatrix} =
\begin{pmatrix}
b(x_1) & b(x_2) & \cdots & b(x_N)
\end{pmatrix}\in \mathbb{R}^{(d+1)\times N}.\non
\end{equation}\]
Let \(\bb{W}(x)\) the \(N\times N\) diagonal matrix with \(i\)th diagonal element \(K_\lambda(x, x_i)\), that is,
\[\begin{equation}
\bW(x) = \begin{pmatrix}
K_\lambda(x,x_1) & 0 & \cdots & 0\\
0 & K_\lambda(x,x_2) & \cdots & 0\\
\vdots & \vdots & \ddots & \vdots\\
0 & 0 & \cdots & K_\lambda(x,x_N)
\end{pmatrix}\in \mathbb{R}^{N\times N}.\non
\end{equation}\]
Note that \(\bW(x) = \bW^T(x)\).
By definition of \(l(x)\) (see, e.g., (6.9) in the text), we have
\[\begin{eqnarray}
l(x_0)^T = b(x_0)^T(\bB^T\bW(x_0)\bB)^{-1}\bB^T\bW(x_0).\non
\end{eqnarray}\]
Denote \(b=b(x_0)\) and \(\bW = \bW(x_0)\) to simplify the notations from now on, we have
\[\begin{eqnarray}
\|l(x_0)\|^2 &=& l(x_0)^Tl(x_0)\non\\
&=& b^T(\bB^T\bW\bB)^{-1}\bB^T\bW\bW^T\bB(\bB^T\bW\bB)^{-1} b.\label{eq:6-3a}
\end{eqnarray}\]
We need to show \(\|l(x_0)\|^2\) is increasing in \(d\). The expression involves with the weighted kernel matrix \(\bW\), however, it turns out \(\|l(x_0)\|^2\) does not depend on \(\bW\). Note that we could plug \(\bb{I} = \bB\bB^T(\bB\bB^T)^{-1} = (\bB\bB^T)^{-1}\bB\bB^T\) between \(\bW\) and \(\bW^T\) in \(\eqref{eq:6-3a}\), we obtain
\[\begin{eqnarray}
&&\|l(x_0)\|^2\non\\
&=&b^T(\bB^T\bW\bB)^{-1}\bB^T\bW\bB\bB^T(\bB\bB^T)^{-1}(\bB\bB^T)^{-1}\bB\bB^T\bW^T\bB(\bB^T\bW\bB)^{-1}b\non\\
&=&b^T\bB^T(\bB\bB^T)^{-1}(\bB\bB^T)^{-1}\bB b,\non
\end{eqnarray}\]
therefore we see \(\|l(x_0)\|^2\) is independent of \(\bW\).
So we can take \(\bW = \bb{I}\) in \(\eqref{eq:6-3a}\), which gives
\[\begin{equation}
\label{eq:6-3c}
\|l(x_0)\|^2 = b^T(\bB^T\bB)^{-1}b.
\end{equation}\]
Now consider the case for \(d+1\), we denote
\[\begin{equation}
\hat\bB = \begin{pmatrix}
1 & x_{1} & \cdots & x^d_1 & x_1^{d+1} \\
1 & x_{2} & \cdots & x_{2}^d & x_2^{d+1}\\
\vdots & \vdots & \ddots & \vdots & \vdots \\
1 & x_{N} & \cdots & x_{N}^d & x_N^{d+1}
\end{pmatrix} =
\begin{pmatrix}
\bB & c
\end{pmatrix}
\in \mathbb{R}^{N\times (d+2)}\non
\end{equation}\]
where
\[\begin{equation}
c = \begin{pmatrix}
x_1^{d+1}\\
x_2^{d+1}\\
\vdots\\
x_N^{d+1}
\end{pmatrix}\in \mathbb{R}^{N\times 1}.\non
\end{equation}\]
Similarly, denote \(\hat b^T = (b^T, x_0^{d+1}) = (1,x_0,x_0^2,...,x_0^d, x_0^{d+1})\in \mathbb{R}^{1\times (d+2)}\). In that case, \(\eqref{eq:6-3c}\) becomes
\[\begin{equation}
\label{eq:6-3d}
\|\hat l(x_0)\|^2 = \hat b^T(\hat \bB^T \hat \bB)^{-1}\hat b.
\end{equation}\]
Further, we have
\[\begin{eqnarray}
\hat\bB^T\hat\bB =
\begin{pmatrix}
\bB^T \\
c^T
\end{pmatrix}
\begin{pmatrix}
\bB & c
\end{pmatrix} =
\begin{pmatrix}
\bB^T\bB & \bB^Tc\\
c^T\bB & c^Tc
\end{pmatrix}
\in \mathbb{R}^{(d+2)\times (d+2)}.\non
\end{eqnarray}\]
Note that \(c^Tc\in \mathbb{R}^1\) is a scalar. Recall the formula for block matrix inverse, (e.g., Schur complement), we have
\[\begin{eqnarray}
&&(\hat \bB^T \hat \bB)^{-1}\non\\
&=&
\begin{pmatrix}
(\bB^T\bB)^{-1} + \frac{1}{k}(\bB^T\bB)^{-1}\bB^Tcc^T\bB(\bB^T\bB)^{-1} & -\frac{1}{k}(\bB^T\bB)^{-1}\bB^Tc\\
-\frac{1}{k}c^T\bB(\bB^T\bB)^{-1} & \frac{1}{k}
\end{pmatrix} \non
\end{eqnarray}\]
where
\[\begin{equation}
\label{eq:6-3k}
k = c^Tc - c^T\bB(\bB^T\bB)^{-1}\bB^Tc.
\end{equation}\]
Denote \(\beta = (\bB^T\bB)^{-1}\bB^Tc\in\mathbb{R}^{(d+2)\times 1}\), plug \((\hat \bB^T \hat \bB)^{-1}\) into \(\eqref{eq:6-3d}\), we get
\[\begin{eqnarray}
&&\|\hat l(x_0)\|^2 \non\\
&=&
\begin{pmatrix}
b^T & x_0^{d+1}
\end{pmatrix}
\begin{pmatrix}
(\bB^T\bB)^{-1} + \frac{1}{k}\beta\beta^T & -\frac{1}{k}\beta\\
-\frac{1}{k}\beta^T & \frac{1}{k}
\end{pmatrix}
\begin{pmatrix}
b\\
x_0^{d+1}
\end{pmatrix}\non\\
&=&b^T(\bB^T\bB)^{-1}b + \frac{1}{k}\left[b^T\beta\beta^Tb - x_0^{d+1}\beta^Tb - x_0^{d+1}b^T\beta+(x_0^{d+1})^2\right]\non\\
&=&b^T(\bB^T\bB)^{-1}b + \frac{1}{k}\left(x_0^{d+1}-b^T\beta\right)^2\ \ \text{ (note } b^T\beta \in \mathbb{R} )\non\\
&=&\|l(x_0)\|^2+ \frac{1}{k}\left(x_0^{d+1}-b^T\beta\right)^2.\non
\end{eqnarray}\]
Therefore, it suffices to show that \(k > 0\) for \(k\) defined in \(\eqref{eq:6-3k}\). To do that, we only need to show
\[\begin{equation}
\label{eq:6-3e}
\bB(\bB^T\bB)^{-1}\bB^T \preceq \bb{I}_N.
\end{equation}\]
Consider the QR decomposition of \(\bB\)
\[\begin{equation}
\bB = \bb{Q}\bb{R}\non
\end{equation}\]
where \(\bb{Q}\) is an \(N\times (d+1)\) orthogonal matrix, \(\bb{Q}^T\bb{Q} = \bb{I}_N\), and \(\bb{R}\in\mathbb{R}^{(d+1)\times (d+1)}\) is an upper triangular matrix. Then
\[\begin{equation}
\bB(\bB^T\bB)^{-1}\bB^T = \bb{Q}\bb{R}(\bb{R}^T\bb{R})^{-1}\bb{R}^T\bb{Q}^T = \bb{Q}\bb{Q}^T.\non
\end{equation}\]
Let \((\bb{Q} \ \bb{Q}_1)\) be an \(N\times N\) orthogonal matrix, we have
\[\begin{equation}
\bb{I}_N =
\begin{pmatrix}
\bb{Q} & \bb{Q}_1
\end{pmatrix}
\begin{pmatrix}
\bb{Q}^T\\
\bb{Q}_1^T
\end{pmatrix} = \bb{Q}\bb{Q}^T + \bb{Q}_1\bb{Q}_1^T.\non
\end{equation}\]
The result \(\eqref{eq:6-3e}\) follows by noting \(\bb{Q}_1\bb{Q}_1^T\) is positive semi-definite. The proof is now complete.