Ex. 14.23

Algorithm for non-negative matrix factorization (Wu and Lange, 2007). A function \(g(x, y)\) is said to minorize a function \(f(x)\) if

\[\begin{equation} g(x, y) \le f(x), \ g(x,x) = f(x)\non \end{equation}\]

for all \(x, y\) in the domain. This is useful for maximizing \(f(x)\) since it is easy to show that \(f(x)\) is nondecreasing under the update

\[\begin{equation} x^{s+1} = \underset{x}{\operatorname{argmax}} g(x,x^s).\non \end{equation}\]

There are analogous definitions for majorization, for minimizing a function \(f(x)\). The resulting algorithms are known as MM algorithms, for minorize-maximize or majorize-minimize (Lange, 2004). It also can be shown that the EM algorithm (8.5) is an example of an MM algorithm: see Section 8.5.3 and Exercise 8.2 for details.

(a) Consider maximization of the function \(L(\bb{W}, \bb{H})\) in (14.73), written here without the matrix notation

\[\begin{equation} L(\bb{W}, \bb{H}) = \sum_{i=1}^N\sum_{j=1}^p\left[x_{ij}\log\left(\sum_{k=1}^rw_{ik}h_{kj}\right)-\sum_{k=1}^rw_{ik}h_{kj}\right].\non \end{equation}\]

Using the concavity of \(\log(x)\), show that for any set of \(r\) values \(y_k\ge 0\) and \(0\le c_k \le 1\) with \(\sum_{k=1}^r c_k = 1\),

\[\begin{equation} \log\left(\sum_{k=1}^r y_k\right)\ge \sum_{k=1}^r c_k \log(y_k/c_k).\non \end{equation}\]

Hence

\[\begin{equation} \log\left(\sum_{k=1}^r w_{ik}h_{kj}\right)\ge \sum_{k=1}^r\frac{a^s_{ikj}}{b^s_{ij}}\log\left(\frac{b^s_{ij}}{a^s_{ikj}}w_{ik}h_{kj}\right),\non \end{equation}\]

where

\[\begin{equation} a^s_{ikj} = w^s_{ik}h^s_{kj} \ \text{and} \ b^s_{ij} = \sum_{k=1}^rw_{ik}^sh^s_{kj},\non \end{equation}\]

and \(s\) indicates the current iteration.

(b) Hence show that, ignoring constants, the function

\[\begin{eqnarray} g(\bW, \bb{H}|\bW^s, \bb{H}^s) &=& \sum_{i=1}^N\sum_{j=1}^p\sum_{k=1}^rx_{ij}\frac{a^s_{ikj}}{b^s_{ij}}\left(\log w_{ik} + \log h_{kj}\right)\non\\ &&\ \ \ \ - \sum_{i=1}^N\sum_{j=1}^p\sum_{k=1}^rw_{ik}h_{kj}\non \end{eqnarray}\]

minorizes \(L(\bW, \bb{H})\).

(c) Set the partial derivatives of \(g(\bW, \bb{H}|\bW^s, \bb{H}^s)\) to zero and hence derive the updating steps (14.74).

Soln. 14.23

(a) Suppose that \(f(x)\) is convex, then we claim

\[\begin{equation} \label{eq:14-23a} f\left(\sum_{i=1}^nc_ix_i\right) \le \sum_{i=1}^nc_if(x_i), \end{equation}\]

where \(\sum_{i=1}^nc_i = 1\) and \(c_i\ge 0\) for \(i=1,...,n\). To prove it, we use mathematical induction. By definition, we know that \(\eqref{eq:14-23a}\) holds when \(n=2\). Suppose \(n > 2\), assume that \(0 < c_n < 1\) and denote \(c_0 = 1-c_n > 0\) and \(x_0 = \sum_{i=1}^{n-1}c_ix_i/c_0\).

Then we have

\[\begin{eqnarray} f\left(\sum_{i=1}^nc_ix_i\right) &=& f\left(\sum_{i=1}^{n-1}c_ix_i + c_n x_n\right)\non\\ &=&f(c_0x_0 + c_nx_n)\non\\ &\le& c_0f(x_0) + c_nf(x_n)\non\\ &=&c_0 f\left(\sum_{i=1}^{n-1}c_ix_i/c_0\right) + c_nf(x_n)\non\\ &\le& c_0 \frac{1}{c_0}\sum_{i=1}^{n-1}c_if(x_i) + c_n f(x_n)\non\\ &=&\sum_{i=1}^nc_if(x_i),\non \end{eqnarray}\]

which is \(\eqref{eq:14-23a}\).

For concave function \(f\), we have

\[\begin{equation} f\left(\sum_{i=1}^nc_ix_i\right) \ge \sum_{i=1}^nc_if(x_i).\non \end{equation}\]

Therefore we know the claim in (a) is correct.

(b) We only need focus on the first term.

Note that

\[\begin{eqnarray} g(\bW, \bb{H}|\bW^s, \bb{H}^s) &=& \sum_{i=1}^N\sum_{j=1}^p\sum_{k=1}^rx_{ij}\frac{a^s_{ikj}}{b^s_{ij}}\left(\log w_{ik} + \log h_{kj}\right)\non\\ &=&\sum_{i=1}^N\sum_{j=1}^p\sum_{k=1}^rx_{ij}\frac{a^s_{ikj}}{b^s_{ij}}\log \left(\frac{b^s_{ij}}{a^s_{ikj}}w_{ik}h_{kj}\right)- C,\non \end{eqnarray}\]

where the constant \(C\) (independent of \(\bW\) and \(\bb{H}\)) is

\[\begin{equation} C = \sum_{i=1}^N\sum_{j=1}^p\sum_{k=1}^rx_{ij}\frac{a^s_{ikj}}{b^s_{ij}}\log \frac{b^s_{ij}}{a^s_{ikj}}.\non \end{equation}\]

Therefore, by (a), we have

\[\begin{equation} g(\bW, \bb{H}|\bW^s, \bb{H}^s) \le L(\bb{W}, \bb{H}),\non \end{equation}\]

and it's easy to verify that the equation holds when \(\bW^s = \bW\) and \(\bb{H}^s = \bb{H}\).

(c) Fix \(i\) and \(k\), we have

\[\begin{eqnarray} \frac{\partial g(\bW, \bb{H}|\bW^s, \bb{H}^s)}{\partial w_{ik}} &=& \sum_{j=1}^p x_{ij}\frac{a^s_{ikj}}{b^s_{ij}}\frac{1}{w_{ik}} - \sum_{j=1}^ph_{kj}\non\\ &=&0,\non \end{eqnarray}\]

so that

\[\begin{eqnarray} w_{ik} &=& \frac{\sum_{j=1}^px_{ij}\frac{a^s_{ikj}}{b^s_{ij}}}{\sum_{j=1}^ph_{kj}}\non\\ &=&\left(\sum_{j=1}^px_{ij} \frac{w^s_{ik}h^s_{kj}}{\sum_{k'=1}^rw^s_{ik'}h^s_{k'j}}\right)\Bigg/\sum_{j=1}^ph_{kj}\non\\ &=&w_{ik}^s\frac{\sum_{j=1}^ph_{kj}x_{ij}/(\bW\bb{H})_{ij}}{\sum_{j=1}^ph_{kj}},\non \end{eqnarray}\]

which is (14.74) in the text. The same arguments apply to the update of \(h_{kj}\), thus we omit it for brevity.