Ex. 2.6
Ex. 2.6
Consider a regression problem with inputs \(x_i\) and outputs \(y_i\), and a parameterized model \(f_\theta(x)\) to be fit by least squares. Show that if there are observations with tied or identical values of \(x\), then the fit can be obtained from a reduced weighted least squares problem.
Soln. 2.6
We use the notation in the text on page 32, consider multiple observation pairs \(x_i, y_{il}, l=1,...,N_i\) at each values of \(x_i\) for \(i=1,...,N\). Our goal is to minimize
\[\begin{equation}
\text{RSS}(\theta) = \sum_{i=1}^N\sum_{l=1}^{N_i}(y_{il} - f_\theta(x_i))^2.\nonumber
\end{equation}\]
Let \(\bar y_i = \frac{1}{N_i}\sum_{l=1}^{N_i}y_{il}\) be the average of \(y_{ij}\) for \(i\)th class. Expanding RSS above we get
\[\begin{eqnarray}
\text{RSS}(\theta) &=& \sum_{i=1}^N\sum_{l=1}^{N_i}(y_{il}^2 - 2 y_{il} f_\theta(x_i) + f_\theta(x_i)^2)\nonumber\\
&=&\sum_{i=1}^N N_i\left(\frac{\sum_{l=1}^{N_i}y_{il}^2}{N_i} -2\bar y_if_\theta(x_i)+ f_\theta(x_i)^2\right)\nonumber\\
&=&\sum_{i=1}^N N_i\left(\bar y_i - f_\theta(x_i)\right)^2 + \text{Terms independent of }\theta.\nonumber
\end{eqnarray}\]
Therefore, we are essentially minimizing
\[\begin{equation}
\text{RSS}(\theta)' = \sum_{i=1}^N N_i\left(\bar y_i - f_\theta(x_i)\right)^2,\nonumber
\end{equation}\]
which is known as a weighted least squares problem.