Ex. 2.9
Ex. 2.9
Consider a linear regression model with \(p\) parameters, fit by least squares to a set of training data \((x_1, y_1), ..., (x_N, y_N)\) drawn at random from a population. Let \(\hat\beta\) be the least squares estimate. Suppose we have some test data \((\tilde x_1, \tilde y_1),...,(\tilde x_M, \tilde y_M)\) drawn at random from the same population as the training data. If \(R_{tr}(\beta) = \frac{1}{N}\sum_1^N(y_i-\beta^Tx_i)^2\) and \(R_{te}(\beta) = \frac{1}{M}\sum_1^M(\tilde y_i-\beta^T\tilde x_i)^2\), prove that
where the expectations are over all that is random in each expression. [This exercise was brought to our attention by Ryan Tibshirani, from a homework assignment given by Andrew Ng.]
Soln. 2.9
Note that both \(\textbf{X}\) and \(\textbf{Y}\) are considered random.
When \(\textbf{X}^T\textbf{X}\) is non-singular, we know that
which is also random. When \(\textbf{X}^T\textbf{X}\) is singular, the simple expression above does not hold, however, there exists a measurable function \(\phi\) such that
Recall the definition of \(\hat\beta\) and IID assumption of \((x_i, y_i)\) for \(i=1,...,N\). For any \(\beta\) and \(i=1,...,N\), we have
Assume \(x_1\neq 0\) almost surely, let
Plug equation above into \(\eqref{eq:2-9a}\) for \(i=1\), by IID assumption of \((\tilde x_i, \tilde y_i)\) for \(i=1,...,M\), we have