Soln. 3.1
First recall that (see the paragraph below (3.8) in the text)
\[\begin{equation}
\hat\sigma^2 = RSS_1/(N-p_1-1).\nonumber
\end{equation}\]
Note that \(z_j=\hat\beta_j/\hat\sigma\sqrt{\nu_j}\) ((3.12) in the text), it suffices to show that
\[\begin{equation}
RSS_0 - RSS_1 = \frac{\hat\beta_j^2}{\nu_{jj}}.\nonumber
\end{equation}\]
We already know how to find \(RSS_1\), the residual sum-of-squares for the original least square model. To find \(RSS_0\),
when \(j\)-th coefficient is dropped from the original model, denote \(e_j = (0,..., 1, ..., 0)^T\in\mathbb{R}^{(p+1)\times 1}\), we are going to solve
\[\begin{eqnarray}
\min_{\beta\in \mathbb{R}^{(p+1)\times 1}}&& (\textbf{y}-\textbf{X}\beta)^T(\textbf{y}-\textbf{X}\beta)\nonumber\\
\text{s.t.}&& e_j^T\beta=0.\nonumber
\end{eqnarray}\]
The Lagrangian multiplier of the problem above is
\[\begin{equation}
L(\beta,\lambda) = (\textbf{y}-\textbf{X}\beta)^T(\textbf{y}-\textbf{X}\beta) + \lambda e_j^T\beta.\nonumber
\end{equation}\]
Denote the optimal solution by \((\hat\beta^{\text{new}}, \hat\lambda^{\text{new}})\).
Taking derivative w.r.t. \(\beta\) and setting it zero we get
\[\begin{equation}
\frac{\partial L(\beta,\lambda)}{\partial \beta} = -2\textbf{X}^T(\textbf{y}-\textbf{X}\beta) + \lambda e_j = 0.\nonumber
\end{equation}\]
So we have
\[\begin{equation}
\hat\beta^{\text{new}} = (\textbf{X}^T\textbf{X})^{-1}\textbf{X}^T\textbf{y} - \frac{\hat\lambda^\text{new}}{2}(\textbf{X}^T\textbf{X})^{-1}e_j.\nonumber
\end{equation}\]
Recall the constraint \(e_j^T\beta=0\), we obtain
\[\begin{equation}
\hat\lambda^{\text{new}} =
2\frac{e_j^T(\textbf{X}^T\textbf{X})^{-1}\textbf{X}^T\textbf{y}}{e_j^T(\textbf{X}^T\textbf{X})^{-1}e_j}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \label{eq:3-1a}
\end{equation}\]
So we have
\[\begin{equation}
\hat\beta^{\text{new}} = \hat\beta - \frac{\hat\lambda}{2}(\textbf{X}^T\textbf{X})^{-1}e_j.\nonumber
\end{equation}\]
Therefore,
\[\begin{eqnarray}
RSS_0 &=& (\textbf{y}-\textbf{X}\hat\beta^{\text{new}})^T(\textbf{y}-\textbf{X}\hat\beta^{\text{new}})\nonumber\\
&=&(\textbf{y}-\textbf{X}\hat\beta)^T(\textbf{y}-\textbf{X}\hat\beta)\nonumber\\
&& +2 (\textbf{y}-\textbf{X}\hat\beta)^T\cdot \textbf{X}\cdot\frac{\hat\lambda}{2}(\textbf{X}^T\textbf{X})^{-1}e_j\nonumber\\
&& + \frac{\hat\lambda^2}{4}e_j^T(\textbf{X}^T\textbf{X})^{-1}e_j\nonumber\\
&=&RSS_1 + \frac{\hat\lambda^2}{4}e_j^T(\textbf{X}^T\textbf{X})^{-1}e_j,\nonumber
\end{eqnarray}\]
where the second summand above vanishes because
\[\begin{equation}
(\textbf{y}-\textbf{X}\hat\beta)^T\cdot \textbf{X} = \textbf{y}^T\textbf{X} - \textbf{y}^T\textbf{X}(\textbf{X}^T\textbf{X})^{-1}\textbf{X}^T\textbf{X} = \textbf{0}.\nonumber
\end{equation}\]
Then, by \(\eqref{eq:3-1a}\), we have
\[\begin{eqnarray}
RSS_0 - RSS_1 &=& \frac{\left(e_j^T(\textbf{X}^T\textbf{X})^{-1}\textbf{X}^T\right)^2}{e_j^T(\textbf{X}^T\textbf{X})^{-1}e_j}\nonumber\\
&=&\frac{\hat\beta^2_j}{\nu_{jj}}.\nonumber
\end{eqnarray}\]
The proof is now complete.