Ex. 2.6

Ex. 2.6

Consider a regression problem with inputs \(x_i\) and outputs \(y_i\), and a parameterized model \(f_\theta(x)\) to be fit by least squares. Show that if there are observations with tied or identical values of \(x\), then the fit can be obtained from a reduced weighted least squares problem.

Soln. 2.6

We use the notation in the text on page 32, consider multiple observation pairs \(x_i, y_{il}, l=1,...,N_i\) at each values of \(x_i\) for \(i=1,...,N\). Our goal is to minimize

\[\begin{equation} \text{RSS}(\theta) = \sum_{i=1}^N\sum_{l=1}^{N_i}(y_{il} - f_\theta(x_i))^2.\nonumber \end{equation}\]

Let \(\bar y_i = \frac{1}{N_i}\sum_{l=1}^{N_i}y_{il}\) be the average of \(y_{ij}\) for \(i\)th class. Expanding RSS above we get

\[\begin{eqnarray} \text{RSS}(\theta) &=& \sum_{i=1}^N\sum_{l=1}^{N_i}(y_{il}^2 - 2 y_{il} f_\theta(x_i) + f_\theta(x_i)^2)\nonumber\\ &=&\sum_{i=1}^N N_i\left(\frac{\sum_{l=1}^{N_i}y_{il}^2}{N_i} -2\bar y_if_\theta(x_i)+ f_\theta(x_i)^2\right)\nonumber\\ &=&\sum_{i=1}^N N_i\left(\bar y_i - f_\theta(x_i)\right)^2 + \text{Terms independent of }\theta.\nonumber \end{eqnarray}\]

Therefore, we are essentially minimizing

\[\begin{equation} \text{RSS}(\theta)' = \sum_{i=1}^N N_i\left(\bar y_i - f_\theta(x_i)\right)^2,\nonumber \end{equation}\]

which is known as a weighted least squares problem.