Soln. 14.8
We first derive the solution (14.57). We need to solve the following optimization problem
\[\begin{eqnarray}
&&\min_{\mu, \bb{R}}\text{trace}\left[\left(\bX_2 - (\bX_1\bR + \bb{1}\mu^T)\right)^T\left(\bX_2 - (\bX_1\bR + \bb{1}\mu^T)\right)\right]\non\\
&&\text{s.t.} \ \bR^T\bR = \bb{I}.\non
\end{eqnarray}\]
The Lagrangian for the problem is
\[\begin{eqnarray}
L(\mu, \bR, \bb{A}) &=& \text{trace}\left[\left(\bX_2 - (\bX_1\bR + \bb{1}\mu^T)\right)^T\left(\bX_2 - (\bX_1\bR + \bb{1}\mu^T)\right)\right]\non\\
&&\ + \text{trace}\left[\bb{A}(\bR^T\bR-\bb{I})\right].\non
\end{eqnarray}\]
By properties of derivatives on trace operator, we calculate the derivatives w.r.t to \(\mu, \bR\) and set them to zero, which gives
\[\begin{eqnarray}
\frac{\partial L(\mu,\bR,\bb{A})}{\partial\mu} &=& -2\bX_2^T\cdot \bb{1} + \bR^T\bX_1^T\cdot \bb{1} + 2N\mu = 0\non\\
\frac{\partial L(\mu,\bR,\bb{A})}{\partial \bR} &=& 2\bX_1^T\bX_1\bR - 2\bX_1^T\bX_2-2\bX_1^T\bb{1}\mu^T + \bR (\bb{A} + \bb{A}^T)=0\non
\end{eqnarray}\]
From the first equation above, we get
\[\begin{eqnarray}
\hat\mu &=& \frac{1}{N}\bX_2^T\cdot\bb{1} - \frac{1}{N}\bR^T\cdot \bX_1^T\cdot\bb{1}\non\\
&=&\bar x_2 - \bR^T\bar x_1.\non
\end{eqnarray}\]
Plug above into the second condition for \(\bR\) we get (by some algebra)
\[\begin{equation}
\label{eq:14-8a}
\tilde{\bX}_1^T \tilde{\bX}_2 = \tilde{\bX}_1^T\tilde{\bX}_1\bR + \bR \bar{\bb{A}},
\end{equation}\]
where
\[\begin{eqnarray}
\tilde{\bX}_i &=& \bX_i - \bb{1}\bar x_i^T\non\\
\bar{\bb{A}} &=& \frac{\bb{A} + \bb{A}^T}{2}.\non
\end{eqnarray}\]
Since \(\bR^T\bR=\bb{I}\), multiply \(\bR^T\) on both sides of \(\eqref{eq:14-8a}\) we get
\[\begin{equation}
\bR^T\tilde{\bX}_1^T \tilde{\bX}_2 = \bR^T\tilde{\bX}_1^T\tilde{\bX}_1\bR + \bar{\bb{A}}.\non
\end{equation}\]
Since both summands on the right hand side are symmetric, we know that \(\bR^T\tilde{\bX}_1^T \tilde{\bX}_2=\tilde{\bX}_2^T\tilde{\bX}_1^T\bR\). Now it's easy to verify that
\[\begin{equation}
\hat{\bR} = \bb{U}\bb{V}^T\non
\end{equation}\]
satisfies the condition, where \(\tilde{\bX}_1 \tilde{\bX}_2=\bb{U}\bb{D}\bb{V}^T\).
For the problem with scaling (14.58), the idea is the similar. The Lagrangian is
\[\begin{eqnarray}
L(\beta, \bR, \bb{A}) &=& \text{trace}[(\bX_2 - \beta\bX_1\bR)^T(\bX_2 - \beta\bX_1\bR)] + \text{trace}[\bb{A}(\bb{R}^T\bR-\bb{I})].\non
\end{eqnarray}\]
We have
\[\begin{eqnarray}
\frac{\partial L(\beta, \bR, \bb{A})}{\partial \beta} &=&
-2\text{trace}[\bX_2\bR^T\bX_1^T] + 2\beta \text{trace}[\bR^T\bX_1^T\bX_1\bR]\non\\
&=& -2\text{trace}[\bX_2\bR^T\bX_1^T] + 2\beta \text{trace}[
\bR\bR^T\bX_1^T\bX_1]\non\\
&=& -2\text{trace}[\bX_2\bR^T\bX_1^T] + 2\beta \text{trace}[
\bX_1^T\bX_1].\non
\end{eqnarray}\]
Therefore we see
\[\begin{equation}
\label{eq:14.8-b}
\hat\beta = \frac{\text{trace}[\bR^T\bX_1^T\bX_2]}{\|\bX_1\|_F}.
\end{equation}\]
With \(\hat\beta\) fixed, we already derived \(\hat\bR\) in the first part of this exercise, with \(\tilde{\bX}_1\) replaced with \(\hat\beta\tilde{\bX}_1\). Therefore we know \(\hat\bR = \bb{U}\bb{V}^T\) as before. Plug \(\hat\bR\) into \(\eqref{eq:14.8-b}\) we have
\[\begin{eqnarray}
\hat\beta &=& \frac{\text{trace}[\bb{V}\bb{U}^T\bb{U}\bb{D}\bb{V}^T]}{\|\bX_1\|_F}\non\\
&=&\frac{\text{trace}[\bb{D}]}{\|\bX_1\|_F}.\non
\end{eqnarray}\]