Ex. 6.3

Show that \(\|l(x)\|\) (Section 6.1.2) increases with the degree of the local polynomial.

Soln. 6.3

Let's first introduce notations. Define the vector-valued function \(b(x)^T = (1,x,x^2,...,x^d)\) for \(d\ge 1\). Let \(\bb{B}\) be the \(N\times (d+1)\) regression matrix with \(i\)th row \(b(x_i)^T\),

\[\begin{equation} \label{eq:6-3B} \bb{B} = \begin{pmatrix} 1 & x_{1} & \cdots & x^d_1 \\ 1 & x_{2} & \cdots & x_{2}^d \\ \vdots & \vdots & \ddots & \vdots \\ 1 & x_{N} & \cdots & x_{N}^d \end{pmatrix} = \begin{pmatrix} b(x_1)^T\\ b(x_2)^T\\ \vdots\\ b(x_N)^T \end{pmatrix}\in \mathbb{R}^{N\times (d+1)}\non \end{equation}\]

and

\[\begin{equation} \bb{B}^T = \begin{pmatrix} 1 & 1 & \cdots & 1 \\ x_{1} & x_{2} & \cdots & x_{N} \\ \vdots & \vdots & \ddots & \vdots \\ x_{1}^d & x_{2}^d & \cdots & x_{N}^d \end{pmatrix} = \begin{pmatrix} b(x_1) & b(x_2) & \cdots & b(x_N) \end{pmatrix}\in \mathbb{R}^{(d+1)\times N}.\non \end{equation}\]

Let \(\bb{W}(x)\) the \(N\times N\) diagonal matrix with \(i\)th diagonal element \(K_\lambda(x, x_i)\), that is,

\[\begin{equation} \bW(x) = \begin{pmatrix} K_\lambda(x,x_1) & 0 & \cdots & 0\\ 0 & K_\lambda(x,x_2) & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & K_\lambda(x,x_N) \end{pmatrix}\in \mathbb{R}^{N\times N}.\non \end{equation}\]

Note that \(\bW(x) = \bW^T(x)\).

By definition of \(l(x)\) (see, e.g., (6.9) in the text), we have

\[\begin{eqnarray} l(x_0)^T = b(x_0)^T(\bB^T\bW(x_0)\bB)^{-1}\bB^T\bW(x_0).\non \end{eqnarray}\]

Denote \(b=b(x_0)\) and \(\bW = \bW(x_0)\) to simplify the notations from now on, we have

\[\begin{eqnarray} \|l(x_0)\|^2 &=& l(x_0)^Tl(x_0)\non\\ &=& b^T(\bB^T\bW\bB)^{-1}\bB^T\bW\bW^T\bB(\bB^T\bW\bB)^{-1} b.\label{eq:6-3a} \end{eqnarray}\]

We need to show \(\|l(x_0)\|^2\) is increasing in \(d\). The expression involves with the weighted kernel matrix \(\bW\), however, it turns out \(\|l(x_0)\|^2\) does not depend on \(\bW\). Note that we could plug \(\bb{I} = \bB\bB^T(\bB\bB^T)^{-1} = (\bB\bB^T)^{-1}\bB\bB^T\) between \(\bW\) and \(\bW^T\) in \(\eqref{eq:6-3a}\), we obtain

\[\begin{eqnarray} &&\|l(x_0)\|^2\non\\ &=&b^T(\bB^T\bW\bB)^{-1}\bB^T\bW\bB\bB^T(\bB\bB^T)^{-1}(\bB\bB^T)^{-1}\bB\bB^T\bW^T\bB(\bB^T\bW\bB)^{-1}b\non\\ &=&b^T\bB^T(\bB\bB^T)^{-1}(\bB\bB^T)^{-1}\bB b,\non \end{eqnarray}\]

therefore we see \(\|l(x_0)\|^2\) is independent of \(\bW\).

So we can take \(\bW = \bb{I}\) in \(\eqref{eq:6-3a}\), which gives

\[\begin{equation} \label{eq:6-3c} \|l(x_0)\|^2 = b^T(\bB^T\bB)^{-1}b. \end{equation}\]

Now consider the case for \(d+1\), we denote

\[\begin{equation} \hat\bB = \begin{pmatrix} 1 & x_{1} & \cdots & x^d_1 & x_1^{d+1} \\ 1 & x_{2} & \cdots & x_{2}^d & x_2^{d+1}\\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 1 & x_{N} & \cdots & x_{N}^d & x_N^{d+1} \end{pmatrix} = \begin{pmatrix} \bB & c \end{pmatrix} \in \mathbb{R}^{N\times (d+2)}\non \end{equation}\]

where

\[\begin{equation} c = \begin{pmatrix} x_1^{d+1}\\ x_2^{d+1}\\ \vdots\\ x_N^{d+1} \end{pmatrix}\in \mathbb{R}^{N\times 1}.\non \end{equation}\]

Similarly, denote \(\hat b^T = (b^T, x_0^{d+1}) = (1,x_0,x_0^2,...,x_0^d, x_0^{d+1})\in \mathbb{R}^{1\times (d+2)}\). In that case, \(\eqref{eq:6-3c}\) becomes

\[\begin{equation} \label{eq:6-3d} \|\hat l(x_0)\|^2 = \hat b^T(\hat \bB^T \hat \bB)^{-1}\hat b. \end{equation}\]

Further, we have

\[\begin{eqnarray} \hat\bB^T\hat\bB = \begin{pmatrix} \bB^T \\ c^T \end{pmatrix} \begin{pmatrix} \bB & c \end{pmatrix} = \begin{pmatrix} \bB^T\bB & \bB^Tc\\ c^T\bB & c^Tc \end{pmatrix} \in \mathbb{R}^{(d+2)\times (d+2)}.\non \end{eqnarray}\]

Note that \(c^Tc\in \mathbb{R}^1\) is a scalar. Recall the formula for block matrix inverse, (e.g., Schur complement), we have

\[\begin{eqnarray} &&(\hat \bB^T \hat \bB)^{-1}\non\\ &=& \begin{pmatrix} (\bB^T\bB)^{-1} + \frac{1}{k}(\bB^T\bB)^{-1}\bB^Tcc^T\bB(\bB^T\bB)^{-1} & -\frac{1}{k}(\bB^T\bB)^{-1}\bB^Tc\\ -\frac{1}{k}c^T\bB(\bB^T\bB)^{-1} & \frac{1}{k} \end{pmatrix} \non \end{eqnarray}\]

where

\[\begin{equation} \label{eq:6-3k} k = c^Tc - c^T\bB(\bB^T\bB)^{-1}\bB^Tc. \end{equation}\]

Denote \(\beta = (\bB^T\bB)^{-1}\bB^Tc\in\mathbb{R}^{(d+2)\times 1}\), plug \((\hat \bB^T \hat \bB)^{-1}\) into \(\eqref{eq:6-3d}\), we get

\[\begin{eqnarray} &&\|\hat l(x_0)\|^2 \non\\ &=& \begin{pmatrix} b^T & x_0^{d+1} \end{pmatrix} \begin{pmatrix} (\bB^T\bB)^{-1} + \frac{1}{k}\beta\beta^T & -\frac{1}{k}\beta\\ -\frac{1}{k}\beta^T & \frac{1}{k} \end{pmatrix} \begin{pmatrix} b\\ x_0^{d+1} \end{pmatrix}\non\\ &=&b^T(\bB^T\bB)^{-1}b + \frac{1}{k}\left[b^T\beta\beta^Tb - x_0^{d+1}\beta^Tb - x_0^{d+1}b^T\beta+(x_0^{d+1})^2\right]\non\\ &=&b^T(\bB^T\bB)^{-1}b + \frac{1}{k}\left(x_0^{d+1}-b^T\beta\right)^2\ \ \text{ (note } b^T\beta \in \mathbb{R} )\non\\ &=&\|l(x_0)\|^2+ \frac{1}{k}\left(x_0^{d+1}-b^T\beta\right)^2.\non \end{eqnarray}\]

Therefore, it suffices to show that \(k > 0\) for \(k\) defined in \(\eqref{eq:6-3k}\). To do that, we only need to show

\[\begin{equation} \label{eq:6-3e} \bB(\bB^T\bB)^{-1}\bB^T \preceq \bb{I}_N. \end{equation}\]

Consider the QR decomposition of \(\bB\)

\[\begin{equation} \bB = \bb{Q}\bb{R}\non \end{equation}\]

where \(\bb{Q}\) is an \(N\times (d+1)\) orthogonal matrix, \(\bb{Q}^T\bb{Q} = \bb{I}_N\), and \(\bb{R}\in\mathbb{R}^{(d+1)\times (d+1)}\) is an upper triangular matrix. Then

\[\begin{equation} \bB(\bB^T\bB)^{-1}\bB^T = \bb{Q}\bb{R}(\bb{R}^T\bb{R})^{-1}\bb{R}^T\bb{Q}^T = \bb{Q}\bb{Q}^T.\non \end{equation}\]

Let \((\bb{Q} \ \bb{Q}_1)\) be an \(N\times N\) orthogonal matrix, we have

\[\begin{equation} \bb{I}_N = \begin{pmatrix} \bb{Q} & \bb{Q}_1 \end{pmatrix} \begin{pmatrix} \bb{Q}^T\\ \bb{Q}_1^T \end{pmatrix} = \bb{Q}\bb{Q}^T + \bb{Q}_1\bb{Q}_1^T.\non \end{equation}\]

The result \(\eqref{eq:6-3e}\) follows by noting \(\bb{Q}_1\bb{Q}_1^T\) is positive semi-definite. The proof is now complete.