Ex. 8.7

EM as a minorization algorithm(Hunter and Lange, 2004; The MM alternative to EM). A function \(g(x, y)\) to said to minorize a function \(f(x)\) is

\[\begin{equation} g(x,y) \le f(x), \ g(x,x) = f(x)\non \end{equation}\]

for all \(x, y\) in the domain. This is useful for maximizing \(f(x)\) since is easy to show that \(f(x)\) is non-decreasing under the update

\[\begin{equation} x^{s+1} = \underset{x}{\operatorname{argmax}}g(x,x^s)\non. \end{equation}\]

There are analogous definitions for majorization, for minimizing a function \(f(x)\). The resulting algorithms are known as \(MM\) algorithms, for Minorize-Maximize or Majorize-Minimize.

Show that the EM algorithm (Section 8.5.2) is an example of an MM algorithm, using \(Q(\theta', \theta) + \log \text{Pr}(\bb{Z}|\theta) - Q(\theta, \theta)\) to minorize the observed data log-likelihood \(\ell(\theta';\bb{Z})\). (Note that only the first term involves the relevant parameter \(\theta'\)).

Soln. 8.7

Denote

\[\begin{eqnarray} g(\theta',\theta) &=& Q(\theta', \theta) + \log \text{Pr}(\bb{Z}|\theta) - Q(\theta, \theta)\non\\ f(\theta) &=& \ell(\theta';\bb{Z}).\non \end{eqnarray}\]

It suffices to show that

(a) \(g(\theta', \theta) \le f(\theta')\) for any \(\theta, \theta'\).

(b) \(g(\theta, \theta) = f(\theta)\) for any \(\theta\).

Since (b) is trivial from our definitions, the rest proof focuses on (a). Recall (8.47) in the text and Ex. 8.1, we know \(R(\theta', \theta)\le R(\theta, \theta)\) so

\[\begin{eqnarray} f(\theta') - f(\theta) &=& \ell(\theta';\bb{Z}) - \ell(\theta;\bb{Z})\non\\ &=& [Q(\theta', \theta) - Q(\theta, \theta)] - [R(\theta', \theta) - R(\theta, \theta)]\non\\ &\ge& [Q(\theta', \theta) - Q(\theta, \theta)].\non \end{eqnarray}\]

A simple algebra shows that \(g(\theta', \theta) \le f(\theta')\) and the proof is complete.