Ex. 13.4

Consider an image to be a function \(F(x):\mathbb{R}^2\ra\mathbb{R}^1\) over the two-dimensional spatial domain (paper coordinates). Then \(F(c+x_0+\bb{A}(x-x_0))\) represents an affine transformation of the image \(F\), where \(\bb{A}\) is a \(2\times 2\) matrix.

(a) Decompose \(\bb{A}\) (via Q-R) in such a way that parameters identifying the four affine transformations (two scale, shear and rotation) are clearly identified.

(b) Using the chain rule, show that the derivative of \(F(c+x_0+\bb{A}(x-x_0))\) w.r.t. each of these parameters can be represented in terms of the two spatial derivatives of \(F\).

(c) Using a two-dimensional kernel smoother (Chapter 6), describe how to implement this procedure when the images are quantized to \(16\times16\) pixels.

Soln. 13.4

(a) In this representation, \(c\) accounts for location shifts, \(x_0\) is the center of rotation, scaling and shear and \(\bb{A}\in \mathbb{R}^{2\times 2}\) is a transformation matrix with QR decomposition \(\bb{A}=R(\theta)T\), where \(R\) is a rotation matrix and \(T\) is an upper-triangular scale/shear matrix.

(b) The first order Taylor series approximation has the form

\[\begin{eqnarray} F^T(x, c, A) &=& F(x) + \sum_{\alpha \in\{c, \theta, T\}}\frac{\partial F}{\partial \alpha}(\alpha-\alpha_0)\non\\ &=&F(x) + \nabla F(x)^T\sum_{\alpha\in \{c, \theta, T\}}\frac{\partial Z(x, x_0, c, A)}{\alpha}(\alpha-\alpha_0),\non \end{eqnarray}\]

where \(Z(x, x_0, c, A) := c+ x_0 + \bb{A}(x-x_0)\).

This leads to the following six derivative (tangent) function \(F_\alpha(x)\):

x-location: \(\alpha = c_1\) and \(F_\alpha = F_x(z) = \frac{\partial F(z)}{\partial x}\)
y-location: \(\alpha = c_2\) and \(F_\alpha = F_y(z) = \frac{\partial F(z)}{\partial y}\)
x-scale: \(\alpha = T_{11}\) and \(F_\alpha = (x-x_0)F_x(z)\)
y-scale: \(\alpha = T_{22}\) and \(F_\alpha = (y-y_0)F_y(z)\)
Rotation: \(\alpha = \theta\) and \(F_\alpha = (y-y_0)F_x(z) - (x-x_0)F_y(z)\)
Shear: \(\alpha = T_{12}\) and \(F_\alpha = (y-y_0)F_x(z) + (x-x_0)F_y(z)\)

(c) As we move to the digitized \(16 \times 16\) pixels, we need to evaluate \(F_x\) and \(F_y\) at the same set of lattice points. We can just smooth the image first and then use the first differences to approximate \(F_x\) and \(F_y\).

Please refer to \cite{hastie1997metrics} for detailed discussions.