Ex. 3.1

Ex. 3.1

Show that the \(F\) statistic (3.13) for dropping a single coefficient from a model is equal to the square of the corresponding \(z\)-score (3.12).

Soln. 3.1

First recall that (see the paragraph below (3.8) in the text)

\[\begin{equation} \hat\sigma^2 = RSS_1/(N-p_1-1).\nonumber \end{equation}\]

Note that \(z_j=\hat\beta_j/\hat\sigma\sqrt{\nu_j}\) ((3.12) in the text), it suffices to show that

\[\begin{equation} RSS_0 - RSS_1 = \frac{\hat\beta_j^2}{\nu_{jj}}.\nonumber \end{equation}\]

We already know how to find \(RSS_1\), the residual sum-of-squares for the original least square model. To find \(RSS_0\), when \(j\)-th coefficient is dropped from the original model, denote \(e_j = (0,..., 1, ..., 0)^T\in\mathbb{R}^{(p+1)\times 1}\), we are going to solve

\[\begin{eqnarray} \min_{\beta\in \mathbb{R}^{(p+1)\times 1}}&& (\textbf{y}-\textbf{X}\beta)^T(\textbf{y}-\textbf{X}\beta)\nonumber\\ \text{s.t.}&& e_j^T\beta=0.\nonumber \end{eqnarray}\]

The Lagrangian multiplier of the problem above is

\[\begin{equation} L(\beta,\lambda) = (\textbf{y}-\textbf{X}\beta)^T(\textbf{y}-\textbf{X}\beta) + \lambda e_j^T\beta.\nonumber \end{equation}\]

Denote the optimal solution by \((\hat\beta^{\text{new}}, \hat\lambda^{\text{new}})\). Taking derivative w.r.t. \(\beta\) and setting it zero we get

\[\begin{equation} \frac{\partial L(\beta,\lambda)}{\partial \beta} = -2\textbf{X}^T(\textbf{y}-\textbf{X}\beta) + \lambda e_j = 0.\nonumber \end{equation}\]

So we have

\[\begin{equation} \hat\beta^{\text{new}} = (\textbf{X}^T\textbf{X})^{-1}\textbf{X}^T\textbf{y} - \frac{\hat\lambda^\text{new}}{2}(\textbf{X}^T\textbf{X})^{-1}e_j.\nonumber \end{equation}\]

Recall the constraint \(e_j^T\beta=0\), we obtain

\[\begin{equation} \hat\lambda^{\text{new}} = 2\frac{e_j^T(\textbf{X}^T\textbf{X})^{-1}\textbf{X}^T\textbf{y}}{e_j^T(\textbf{X}^T\textbf{X})^{-1}e_j}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \label{eq:3-1a} \end{equation}\]

So we have

\[\begin{equation} \hat\beta^{\text{new}} = \hat\beta - \frac{\hat\lambda}{2}(\textbf{X}^T\textbf{X})^{-1}e_j.\nonumber \end{equation}\]

Therefore,

\[\begin{eqnarray} RSS_0 &=& (\textbf{y}-\textbf{X}\hat\beta^{\text{new}})^T(\textbf{y}-\textbf{X}\hat\beta^{\text{new}})\nonumber\\ &=&(\textbf{y}-\textbf{X}\hat\beta)^T(\textbf{y}-\textbf{X}\hat\beta)\nonumber\\ && +2 (\textbf{y}-\textbf{X}\hat\beta)^T\cdot \textbf{X}\cdot\frac{\hat\lambda}{2}(\textbf{X}^T\textbf{X})^{-1}e_j\nonumber\\ && + \frac{\hat\lambda^2}{4}e_j^T(\textbf{X}^T\textbf{X})^{-1}e_j\nonumber\\ &=&RSS_1 + \frac{\hat\lambda^2}{4}e_j^T(\textbf{X}^T\textbf{X})^{-1}e_j,\nonumber \end{eqnarray}\]

where the second summand above vanishes because

\[\begin{equation} (\textbf{y}-\textbf{X}\hat\beta)^T\cdot \textbf{X} = \textbf{y}^T\textbf{X} - \textbf{y}^T\textbf{X}(\textbf{X}^T\textbf{X})^{-1}\textbf{X}^T\textbf{X} = \textbf{0}.\nonumber \end{equation}\]

Then, by \(\eqref{eq:3-1a}\), we have

\[\begin{eqnarray} RSS_0 - RSS_1 &=& \frac{\left(e_j^T(\textbf{X}^T\textbf{X})^{-1}\textbf{X}^T\right)^2}{e_j^T(\textbf{X}^T\textbf{X})^{-1}e_j}\nonumber\\ &=&\frac{\hat\beta^2_j}{\nu_{jj}}.\nonumber \end{eqnarray}\]

The proof is now complete.