Ex. 14.10

Derive the solution to the affine-invariant average problem (14.60). Apply it to the three S’s, and compare the results to those computed in Exercise 14.9.

Soln. 14.10

We need to solve the problem

\[\begin{equation} \min_{\{\bb{A}_l\}_1^L, \bb{M}}\sum_{l=1}^N\|\bX_l\bb{A}_l - \bM\|_F^2,\non \end{equation}\]

where \(\bA_l\) are \(p\times p\) nonsingular matrices and \(\bM^T\bM = \bb{I}\).

The Lagrangian for the problem is

\[\begin{eqnarray} L(\bA_l, \bM, \bA) &=& \sum_{l=1}^L\text{trace}[(\bA_l^T\bX_l^T-\bM^T)(\bA_l\bX_l-\bM)] + \text{trace}[\bA(\bM^T\bM-\bb{I})].\non \end{eqnarray}\]

Taking derivative w.r.t \(\bA_l\) and setting it to be zero we have

\[\begin{eqnarray} \frac{\partial L(\bA_l, \bM, \bA)}{\partial \bA_l} &=& 2(\bX_l^T\bX_l\bA_l - \bX_l^T\bM)=0,\non \end{eqnarray}\]

thus we know \(\bA_l = (\bX_l^T\bX_l)^{-1}\bX_l\bM\). Denote \(\bb{H}_l = \bX_l(\bX_l^T\bX_l)^{-1}\bX_l\), and plug the solution for \(\bA_l\) into the original problem, it reduces to minimize

\[\begin{eqnarray} &&\sum_{l=1}^L \text{trace}[\bM^T(\bb{H}_l^T-\bb{I})(\bb{H}_l-\bb{I})\bM]\non\\ &=&\sum_{l=1}^L\text{trace}[\bM^T(\bb{I}-\bb{H}_l)\bM]\non\\ &=&L\text{trace}\left[\bM^T\bM - \bM^T\bar{\bb{H}}\bM\right]\non\\ &=&Lq - L\text{trace}[\bM^T\bar{\bb{H}}\bM]. \end{eqnarray}\]

Now we see the problem reduces to

\[\begin{equation} \max_{\bM} \text{trace}[\bM^T\bar{\bb{H}}\bM] \end{equation}\]

under the condition that \(\bM^T\bM = \bb{I}\). The optimal solution is achieved when \(\bM\) is an orthogonal basis of the eigenspace associated with the largest eigenvalues of \(\bar{\bb{H}}\). That is, \(\bM\) is the \(N\times p\) matrix formed from the \(p\) largest eigenvectors of \(\bar{\bb{H}}\). The proof follows directly from Courant-Fisher characterization.

We give a short proof here. Consider the Lagrangian again and taking derivatives w.r.t \(\bM\), we get

\[\begin{equation} \bar{\bb{H}}\bM = \bM \bar{\bb{A}},\non \end{equation}\]

for some symmetric \(\bar{\bb{A}}\). This is an invariant subspace equation. The equations allow \(\bM\) to be an arbitrary orthogonal basis for the rank-\(p\) subspace. It's then clear the choice of \(\bM\) described earlier is an optimal solution.