Ex. 13.6

Here we consider the problem of shape averaging. In particular, \(\bb{L}_i, i=1,...,M\) are each \(N\times 2\) matrices of points in \(\mathbb{R}^2\), each sampled from corresponding positions of handwritten (cursive) letters. We seek an affine invariant average \(\bb{V}\), also \(N\times 2\), \(\bb{V}^T\bb{V}=I\), of the \(M\) letters \(\bb{L}_i\) with the following property: \(\bb{V}\) minimizes

\[\begin{equation} \sum_{j=1}^M\min_{\bb{A}_j}\|\bb{L}_j-\bb{V}\bb{A}_j\|^2.\non \end{equation}\]

Characterize the solution.

This solution can suffer if some of the letters are big and dominate the average. An alternative approach is to minimize instead:

\[\begin{equation} \sum_{j=1}^M\min_{\bb{A}_j}\|\bb{L}_j\bb{A}_j^\ast-\bb{V}\|^2.\non \end{equation}\]

Derive the solution to this problem. How do the criteria differ? Use the SVD of the \(\bb{L}_j\) to simplify the comparison of the two approaches.

Soln. 13.6

This exercise is similar to Ex. 14.10. Note the norm \(\|\cdot\|\) here is understood as Frobenius norm \(\|M\|^2 = \text{trace}[M^TM]\) for matrix \(M.\)

First consider

\[\begin{equation} \min_{\bb{A}_j}\|\bb{L}_j-\bb{V}\bb{A}_j\|^2\non \end{equation}\]

with \(\bb{V}\) fixed. By definition, we have

\[\begin{eqnarray} &&\|\bb{L}_j-\bb{V}\bb{A}_j\|^2\non\\ &=& \text{trace}[(\bb{L}_j-\bb{V}\bb{A}_j)^T(\bb{L}_j-\bb{V}\bb{A}_j)]\non\\ &=&\text{trace}[(\bb{L}_j^T-\bb{A}_j^T\bb{V}^T)(\bb{L}_j-\bb{V}\bb{A}_j)] \non\\ &=&\text{trace}[\bb{L}_j^T\bb{L}_j - 2\bb{L}_j^T\bb{V}\bb{A}_j + \bb{A}_j^T\bb{A}_j].\non \end{eqnarray}\]

By properties of trace operator (\cite{matbook}), we have

\[\begin{eqnarray} \frac{\partial \|\bb{L}_j-\bb{V}\bb{A}_j\|^2}{\partial \bb{A}_j} = -2 \bb{V}^T\bb{L}_j + 2 \bb{A}_j\non \end{eqnarray}\]

so that we know the optimal \(\hat{\bb{A}}_j = \bb{V}^T\bb{L}_j\). Plug it into the original problem, we then need to minimize

\[\begin{equation} \sum_{j=1}^M\|\bb{L}_j-\bb{V}\bb{V}^T\bb{L}_j\|^2\non \end{equation}\]

subject to \(\bb{V}^T\bb{V}=\bb{I}\).

Note that

\[\begin{eqnarray} \sum_{j=1}^M\|\bb{L}_j-\bb{V}\bb{V}^T\bb{L}_j\|^2&=&\sum_{j=1}^M\text{trace}[\bb{L}_j^T(\bb{I}-\bb{V}\bb{V}^T)\bb{L}_j)]\non\\ &=&\sum_{j=1}^M\text{trace}[\bb{L}_j^T\bb{L}_j] - \sum_{j=1}^M\text{trace}[\bb{L}_j^T\bb{V}\bb{V}^T\bb{L}_j],\non \end{eqnarray}\]

the problem reduces to

\[\begin{eqnarray} \max_{\bb{V}\in \mathbb{R}^{N\times 2}}&&\sum_{j=1}^M\text{trace}[\bb{L}_j^T\bb{V}\bb{V}^T\bb{L}_j]\non\\ \text{s.t.}&&\bb{V}^T\bb{V}=\bb{I}.\non \end{eqnarray}\]

Consider the Lagrangian function

\[\begin{eqnarray} L(\bb{V}, \bb{A})=\sum_{j=1}^M\text{trace}[\bb{L}_j^T\bb{V}\bb{V}^T\bb{L}_j] + \text{trace}[\bb{A}(\bb{V}^T\bb{V}-\bb{I})].\non \end{eqnarray}\]

We know that

\[\begin{eqnarray} \frac{\partial L(\bb{V}, \bb{A})}{\partial \bb{V}} =2M[\bb{L}\bb{V} + \bb{V}\bar{\bb{A}}],\non \end{eqnarray}\]

where

\[\begin{eqnarray} \bb{L} = \frac{1}{M}\sum_{j=1}^M\bb{L}_j\bb{L}_j^T,\ \ \bar{\bb{A}} = \frac{\bb{A}+\bb{A}^T}{2M},\non \end{eqnarray}\]

are both symmetric.

Therefore we know that the optimal \(\hat{\bb{V}}\) satisfies

\[\begin{equation} \bb{L}\bb{V} = \bb{V}\cdot (-\bar{\bb{A}}),\non \end{equation}\]

which is an invariant subspace equation. The equation allow \(\hat{\bb{V}}\) to be an arbitrary orthogonal basis for the rank-2 subspace. Therefore, \(\hat{\bb{V}}\) is the \(N\times 2\) matrix formed from the 2 largest eigenvectors of \(\bb{L}=\frac{1}{M}\sum_{j=1}^M\bb{L}_j\bb{L}_j^T\).

Now we turn to the second part of the exercise where the objective becomes

\[\begin{equation} \sum_{j=1}^M\min_{\bb{A}_j}\|\bb{L}_j\bb{A}_j^\ast-\bb{V}\|^2.\non \end{equation}\]

Following the same logic above, we are able to show that the optimal \(\hat{\bb{V}}\) is the \(N\times 2\) matrix formed from the 2 largest eigenvectors of \(\bar{\bb{L}} = \frac{1}{M}\sum_{j=1}^M\bb{L}_j(\bb{L}_j^T\bb{L}_j)^{-1}\bb{L}_j^T\). For a detailed proof, see Ex. 14.10.

The second criteria is a normalized version of the first one.