Ex. 14.8

Derive the solution (14.57) to the Procrustes problem (14.56). Derive also the solution to the Procrustes problem with scaling (14.58).

Soln. 14.8

We first derive the solution (14.57). We need to solve the following optimization problem

\[\begin{eqnarray} &&\min_{\mu, \bb{R}}\text{trace}\left[\left(\bX_2 - (\bX_1\bR + \bb{1}\mu^T)\right)^T\left(\bX_2 - (\bX_1\bR + \bb{1}\mu^T)\right)\right]\non\\ &&\text{s.t.} \ \bR^T\bR = \bb{I}.\non \end{eqnarray}\]

The Lagrangian for the problem is

\[\begin{eqnarray} L(\mu, \bR, \bb{A}) &=& \text{trace}\left[\left(\bX_2 - (\bX_1\bR + \bb{1}\mu^T)\right)^T\left(\bX_2 - (\bX_1\bR + \bb{1}\mu^T)\right)\right]\non\\ &&\ + \text{trace}\left[\bb{A}(\bR^T\bR-\bb{I})\right].\non \end{eqnarray}\]

By properties of derivatives on trace operator, we calculate the derivatives w.r.t to \(\mu, \bR\) and set them to zero, which gives

\[\begin{eqnarray} \frac{\partial L(\mu,\bR,\bb{A})}{\partial\mu} &=& -2\bX_2^T\cdot \bb{1} + \bR^T\bX_1^T\cdot \bb{1} + 2N\mu = 0\non\\ \frac{\partial L(\mu,\bR,\bb{A})}{\partial \bR} &=& 2\bX_1^T\bX_1\bR - 2\bX_1^T\bX_2-2\bX_1^T\bb{1}\mu^T + \bR (\bb{A} + \bb{A}^T)=0\non \end{eqnarray}\]

From the first equation above, we get

\[\begin{eqnarray} \hat\mu &=& \frac{1}{N}\bX_2^T\cdot\bb{1} - \frac{1}{N}\bR^T\cdot \bX_1^T\cdot\bb{1}\non\\ &=&\bar x_2 - \bR^T\bar x_1.\non \end{eqnarray}\]

Plug above into the second condition for \(\bR\) we get (by some algebra)

\[\begin{equation} \label{eq:14-8a} \tilde{\bX}_1^T \tilde{\bX}_2 = \tilde{\bX}_1^T\tilde{\bX}_1\bR + \bR \bar{\bb{A}}, \end{equation}\]

where

\[\begin{eqnarray} \tilde{\bX}_i &=& \bX_i - \bb{1}\bar x_i^T\non\\ \bar{\bb{A}} &=& \frac{\bb{A} + \bb{A}^T}{2}.\non \end{eqnarray}\]

Since \(\bR^T\bR=\bb{I}\), multiply \(\bR^T\) on both sides of \(\eqref{eq:14-8a}\) we get

\[\begin{equation} \bR^T\tilde{\bX}_1^T \tilde{\bX}_2 = \bR^T\tilde{\bX}_1^T\tilde{\bX}_1\bR + \bar{\bb{A}}.\non \end{equation}\]

Since both summands on the right hand side are symmetric, we know that \(\bR^T\tilde{\bX}_1^T \tilde{\bX}_2=\tilde{\bX}_2^T\tilde{\bX}_1^T\bR\). Now it's easy to verify that

\[\begin{equation} \hat{\bR} = \bb{U}\bb{V}^T\non \end{equation}\]

satisfies the condition, where \(\tilde{\bX}_1 \tilde{\bX}_2=\bb{U}\bb{D}\bb{V}^T\).

For the problem with scaling (14.58), the idea is the similar. The Lagrangian is

\[\begin{eqnarray} L(\beta, \bR, \bb{A}) &=& \text{trace}[(\bX_2 - \beta\bX_1\bR)^T(\bX_2 - \beta\bX_1\bR)] + \text{trace}[\bb{A}(\bb{R}^T\bR-\bb{I})].\non \end{eqnarray}\]

We have

\[\begin{eqnarray} \frac{\partial L(\beta, \bR, \bb{A})}{\partial \beta} &=& -2\text{trace}[\bX_2\bR^T\bX_1^T] + 2\beta \text{trace}[\bR^T\bX_1^T\bX_1\bR]\non\\ &=& -2\text{trace}[\bX_2\bR^T\bX_1^T] + 2\beta \text{trace}[ \bR\bR^T\bX_1^T\bX_1]\non\\ &=& -2\text{trace}[\bX_2\bR^T\bX_1^T] + 2\beta \text{trace}[ \bX_1^T\bX_1].\non \end{eqnarray}\]

Therefore we see

\[\begin{equation} \label{eq:14.8-b} \hat\beta = \frac{\text{trace}[\bR^T\bX_1^T\bX_2]}{\|\bX_1\|_F}. \end{equation}\]

With \(\hat\beta\) fixed, we already derived \(\hat\bR\) in the first part of this exercise, with \(\tilde{\bX}_1\) replaced with \(\hat\beta\tilde{\bX}_1\). Therefore we know \(\hat\bR = \bb{U}\bb{V}^T\) as before. Plug \(\hat\bR\) into \(\eqref{eq:14.8-b}\) we have

\[\begin{eqnarray} \hat\beta &=& \frac{\text{trace}[\bb{V}\bb{U}^T\bb{U}\bb{D}\bb{V}^T]}{\|\bX_1\|_F}\non\\ &=&\frac{\text{trace}[\bb{D}]}{\|\bX_1\|_F}.\non \end{eqnarray}\]