Ex. 3.11
Ex. 3.11
Show that the solution to the multivariate linear regression problem (3.40) is given by (3.39). What happens if the covariance matrices \(\boldsymbol{\Sigma}_i\) are different for each observation?
Soln. 3.11
Like (3.38), we write (3.40) in matrix form
\[\begin{equation}
\text{RSS}(\textbf{B};\boldsymbol{\Sigma}) = \text{tr}[(\textbf{Y}-\textbf{X}\textbf{B})^T\boldsymbol{\Sigma}^{-1}(\textbf{Y}-\textbf{X}\textbf{B})].\nonumber
\end{equation}\]
By properties of trace operator, we have
\[\begin{eqnarray}
&&\text{RSS}(\textbf{B};\boldsymbol{\Sigma})\nonumber\\
&=& \text{tr}[(\textbf{Y}^T\boldsymbol{\Sigma}^{-1}-\textbf{B}^T\textbf{X}^T\boldsymbol{\Sigma}^{-1})(\textbf{Y}-\textbf{X}\textbf{B})]\nonumber\\
&=&\text{tr}(\textbf{Y}^T\boldsymbol{\Sigma}^{-1}\textbf{Y}-\textbf{Y}^T\boldsymbol{\Sigma}^{-1}\textbf{X}\textbf{B} - \textbf{B}^T\textbf{X}^T\boldsymbol{\Sigma}^{-1}\textbf{Y} + \textbf{B}^T\textbf{X}^T\boldsymbol{\Sigma}^{-1}\textbf{X}\textbf{B}).\nonumber
\end{eqnarray}\]
Taking derivative and setting it to be zero, we get
\[\begin{eqnarray}
&&\frac{\partial \text{RSS}(\textbf{B};\boldsymbol{\Sigma})}{\partial \textbf{B}}\nonumber\\
&=& \textbf{X}^T(\boldsymbol{\Sigma}^{-1}+(\boldsymbol{\Sigma}^{-1})^{T})\textbf{X}\textbf{B} - \textbf{X}^T(\boldsymbol{\Sigma}^{-1}+(\boldsymbol{\Sigma}^{-1})^{T})\textbf{Y}\nonumber\\
&=&\textbf{0}. \label{eq:3-11a}
\end{eqnarray}\]
Note that \(\boldsymbol{\Sigma}\) is a positive definite symmetric matrix, there exists \(\textbf{S}\) such that \(\boldsymbol{\Sigma}^{-1}=\textbf{S}\textbf{S}^T\). Therefore we obtain
\[\begin{eqnarray}
\hat{\textbf{B}} &=& (\textbf{X}^T\textbf{S}\textbf{S}^T\textbf{X})^{-1}\textbf{X}^T\textbf{S}\textbf{S}^T\textbf{Y}\nonumber\\
&=&(\textbf{X}^T\textbf{S}\textbf{S}^T\textbf{X})^{-1}\textbf{X}^T\textbf{S}\textbf{S}^T\textbf{X}\textbf{X}^T(\textbf{X}\textbf{X}^T)^{-1}\textbf{Y}\nonumber\\
&=&\textbf{X}^T(\textbf{X}\textbf{X}^T)^{-1}\textbf{Y}\nonumber\\
&=&(\textbf{X}^T\textbf{X})^{-1}\textbf{X}^T\textbf{X}\textbf{X}^T(\textbf{X}\textbf{X}^T)^{-1}\textbf{Y}\nonumber\\
&=&(\textbf{X}^T\textbf{X})^{-1}\textbf{X}^T\textbf{Y},\nonumber
\end{eqnarray}\]
which is (3.39) in the text.
When \(\boldsymbol{\Sigma}_i\) are different, the simple solution for \(\textbf{B}\) above does not hold. Instead, we have to deal with equations like \(\eqref{eq:3-11a}\) with different \(\boldsymbol{\Sigma}_i\). Numerical solutions are available though, as the problem is essentially in quadratic form of \(\textbf{B}\).