Ex. 3.21

Ex. 3.21

Show that the solution to the reduced-rank regression problem (3.68), with \(\boldsymbol{\Sigma}\) estimated by \(\bb{Y}^T\bb{Y}/N\), is given by (3.69).

Hint: Transform \(\bb{Y}\) to \(\bb{Y}^\ast=\bb{Y}\boldsymbol{\Sigma}^{-\frac{1}{2}}\), and solved in terms of the canonical vectors \(u_m^\ast\). Show that \(\bb{U}_m=\boldsymbol{\Sigma}^{-\frac{1}{2}}\bb{U}_m^\ast\), and a generalized inverse is \(\bb{U}_m^- = \bb{U}_m^{\ast T}\boldsymbol{\Sigma}^{\frac{1}{2}}\).

Soln. 3.21

The problem can be rewritten as

\[\begin{eqnarray} \underset{\text{rank}(\bb{B})=m}{\operatorname{argmin}} \text{trace}\left[(\bY-\bX\bb{B})\bm{\Sigma}^{-1}(\bY-\bX\bb{B})^T\right].\non \end{eqnarray}\]

By properties of trace operator and the fact that \(\bm{\Sigma}\) is definite positive and symmetric, we have

\[\begin{eqnarray} &&\text{trace}\left[(\bY-\bX\bb{B})\bm{\Sigma}^{-1}(\bY-\bX\bb{B})^T\right]\non\\ &=&\text{trace}\left[(\bY\bm{\Sigma}^{-\frac{1}{2}}-\bX\bb{B}\bm{\Sigma}^{-\frac{1}{2}})(\bY\bm{\Sigma}^{-\frac{1}{2}}-\bX\bb{B}\bm{\Sigma}^{-\frac{1}{2}})^T\right]\non\\ &=&\text{trace}\left[(\bY\bm{\Sigma}^{-\frac{1}{2}}-\bX\bb{B}\bm{\Sigma}^{-\frac{1}{2}})(\bm{\Sigma}^{-\frac{1}{2}}\bY^T-\bm{\Sigma}^{-\frac{1}{2}}\bB^T\bX^T)\right]\non\\ &=&\text{trace}\left[\bY\bm{\Sigma}^{-1}\bY^T + \bX\bB\bm{\Sigma}^{-\frac{1}{2}}\bm{\Sigma}^{-\frac{1}{2}}\bB^T\bX^T - 2\bm{\Sigma}^{-\frac{1}{2}}\bY^T\bX\bB\bm{\Sigma}^{-\frac{1}{2}} \right]\non\\ &=&\text{trace}\left[\bY\bm{\Sigma}^{-1}\bY^T + \bm{\Sigma}^{-\frac{1}{2}}\bB^T\bX^T\bX\bB\bm{\Sigma}^{-\frac{1}{2}} - 2\bm{\Sigma}^{-\frac{1}{2}}\bY^T\bX\bB\bm{\Sigma}^{-\frac{1}{2}} \right]\non\\ &=&\text{trace}\left[\bY\bm{\Sigma}^{-1}\bY^T+\bm{\Sigma}^{-1}\bY^T\bX(\bX^T\bX)^{-1}\bX^T\bY\right] + \text{trace}\left[\bb{C}\bb{C}^T\right]\label{eq:3-21a} \end{eqnarray}\]

where

\[\begin{equation} \label{eq:3-21b} \bb{C} = \bm{\Sigma}^{-\frac{1}{2}}\bB^T(\bX^T\bX)^{\frac{1}{2}}-\bm{\Sigma}^{-\frac{1}{2}}(\bY^T\bX)(\bX^T\bX)^{-\frac{1}{2}}.\non \end{equation}\]

Note that the first summand in \(\eqref{eq:3-21a}\) is independent of \(\bB\), therefore the original problem reduces to the standard low-rank approximation problem

\[\begin{equation} \label{eq:3-21c} \underset{\text{rank}(\bb{B})=m}{\operatorname{argmin}}\|\bm{\Sigma}^{-\frac{1}{2}}\bB^T(\bX^T\bX)^{\frac{1}{2}}-\bm{\Sigma}^{-\frac{1}{2}}(\bY^T\bX)(\bX^T\bX)^{-\frac{1}{2}}\|_F^2, \end{equation}\]

where \(\|M\|_F^2 = \text{trace}(M^TM)\) is the squared Frobenius norm for matrix \(M\) (see, e.g., page 540 in the text)

By standard results in low-rank approximation (e.g., Eckart–Young–Mirsky theorem), we know the optimal \(\hat\bB(m)\) with rank \(m\) that solves problem \(\eqref{eq:3-21c}\) satisfies

\[\begin{equation} \label{eq:3-21d} \bm{\Sigma}^{-\frac{1}{2}}\hat\bB(m)^T(\bX^T\bX)^{\frac{1}{2}} = \sum_{i=1}^m d_i\bb{u}_i\bb{v}_i^T \end{equation}\]

where \(\bb{u}_i\), \(\bb{v}_i\) and \(d_i\) are obtained from the following SVD

\[\begin{equation} \label{eq:3-21e} \bm{\Sigma}^{-\frac{1}{2}}(\bY^T\bX)(\bX^T\bX)^{-\frac{1}{2}} = \bb{U}^\ast\bb{D}^\ast\bb{V}^{\ast T}. \end{equation}\]

Since \(\bb{U}^\ast\) has orthogonal columns, \(\eqref{eq:3-21e}\) yields

\[\begin{equation} \bb{U}^{\ast ^T}\bm{\Sigma}^{-\frac{1}{2}}(\bY^T\bX)(\bX^T\bX)^{-\frac{1}{2}} = \bb{D}^\ast\bb{V}^{\ast ^T}\non \end{equation}\]

and thus by matrix transpose

\[\begin{equation} \bb{V}^\ast\bb{D}^\ast = (\bX^T\bX)^{-\frac{1}{2}}\bX^T\bY\bm{\Sigma}^{-\frac{1}{2}}\bb{U}^\ast.\non \end{equation}\]

Then, by \(\eqref{eq:3-21d}\), we know

\[\begin{eqnarray} \label{eq:3-21f} \hat \bB(m) &=& (\bX^T\bX)^{-\frac{1}{2}}\left(\sum_{i=1}^m (\bb{v}_i d_i)\cdot \bb{u}_i^T\right)\bm{\Sigma}^{\frac{1}{2}}\non\\ &=&(\bX^T\bX)^{-\frac{1}{2}}\left((\bX^T\bX)^{-\frac{1}{2}}\bX^T\bY\bm{\Sigma}^{-\frac{1}{2}}\bb{U}_m^\ast \cdot \bb{U}_m^{\ast ^T}\right)\bm{\Sigma}^{\frac{1}{2}}\non\\ &=&(\bX^T\bX)^{-1}\bX^T\bY\bb{U}_m\bb{U}_m^-\non\\ &=&\hat \bB\bb{U}_m\bb{U}_m^-, \end{eqnarray}\]

where \(\bb{U}_m=\bm{\Sigma}^{-\frac{1}{2}}\bb{U}_m^\ast\) and \(\bb{U}_m^- = \bb{U}_m^{\ast T}\bm{\Sigma}^{\frac{1}{2}}\).

Note that by far we haven't used the assumption that \(\bm{\Sigma}=\bY^T\bY/N\). By \(\eqref{eq:3-21c}\), it's easy to see that the solution remains unchanged if we multiply a constant to \(\bm{\Sigma}\). So without loss of generality, we assume \(\bm{\Sigma} = \bY^T\bY\), then \(\eqref{eq:3-21e}\) becomes exactly the same as (3.87) in the text.