Ex. 14.23
Ex. 14.23
Algorithm for non-negative matrix factorization (Wu and Lange, 2007). A function \(g(x, y)\) is said to minorize a function \(f(x)\) if
\[\begin{equation}
g(x, y) \le f(x), \ g(x,x) = f(x)\non
\end{equation}\]
for all \(x, y\) in the domain. This is useful for maximizing \(f(x)\) since it is easy to show that \(f(x)\) is nondecreasing under the update
\[\begin{equation}
x^{s+1} = \underset{x}{\operatorname{argmax}} g(x,x^s).\non
\end{equation}\]
There are analogous definitions for majorization, for minimizing a function \(f(x)\). The resulting algorithms are known as MM algorithms, for minorize-maximize or majorize-minimize (Lange, 2004). It also can be shown that the EM algorithm (8.5) is an example of an MM algorithm: see Section 8.5.3 and Exercise 8.2 for details.
(a) Consider maximization of the function \(L(\bb{W}, \bb{H})\) in (14.73), written here without the matrix notation
\[\begin{equation}
L(\bb{W}, \bb{H}) = \sum_{i=1}^N\sum_{j=1}^p\left[x_{ij}\log\left(\sum_{k=1}^rw_{ik}h_{kj}\right)-\sum_{k=1}^rw_{ik}h_{kj}\right].\non
\end{equation}\]
Using the concavity of \(\log(x)\), show that for any set of \(r\) values \(y_k\ge 0\) and \(0\le c_k \le 1\) with \(\sum_{k=1}^r c_k = 1\),
\[\begin{equation}
\log\left(\sum_{k=1}^r y_k\right)\ge \sum_{k=1}^r c_k \log(y_k/c_k).\non
\end{equation}\]
Hence
\[\begin{equation}
\log\left(\sum_{k=1}^r w_{ik}h_{kj}\right)\ge \sum_{k=1}^r\frac{a^s_{ikj}}{b^s_{ij}}\log\left(\frac{b^s_{ij}}{a^s_{ikj}}w_{ik}h_{kj}\right),\non
\end{equation}\]
where
\[\begin{equation}
a^s_{ikj} = w^s_{ik}h^s_{kj} \ \text{and} \ b^s_{ij} = \sum_{k=1}^rw_{ik}^sh^s_{kj},\non
\end{equation}\]
and \(s\) indicates the current iteration.
(b) Hence show that, ignoring constants, the function
\[\begin{eqnarray}
g(\bW, \bb{H}|\bW^s, \bb{H}^s) &=& \sum_{i=1}^N\sum_{j=1}^p\sum_{k=1}^rx_{ij}\frac{a^s_{ikj}}{b^s_{ij}}\left(\log w_{ik} + \log h_{kj}\right)\non\\
&&\ \ \ \ - \sum_{i=1}^N\sum_{j=1}^p\sum_{k=1}^rw_{ik}h_{kj}\non
\end{eqnarray}\]
minorizes \(L(\bW, \bb{H})\).
(c) Set the partial derivatives of \(g(\bW, \bb{H}|\bW^s, \bb{H}^s)\) to zero and hence derive the updating steps (14.74).
Soln. 14.23
(a) Suppose that \(f(x)\) is convex, then we claim
\[\begin{equation}
\label{eq:14-23a}
f\left(\sum_{i=1}^nc_ix_i\right) \le \sum_{i=1}^nc_if(x_i),
\end{equation}\]
where \(\sum_{i=1}^nc_i = 1\) and \(c_i\ge 0\) for \(i=1,...,n\). To prove it, we use mathematical induction. By definition, we know that \(\eqref{eq:14-23a}\) holds when \(n=2\). Suppose \(n > 2\), assume that \(0 < c_n < 1\) and denote \(c_0 = 1-c_n > 0\) and \(x_0 = \sum_{i=1}^{n-1}c_ix_i/c_0\).
Then we have
\[\begin{eqnarray}
f\left(\sum_{i=1}^nc_ix_i\right) &=& f\left(\sum_{i=1}^{n-1}c_ix_i + c_n x_n\right)\non\\
&=&f(c_0x_0 + c_nx_n)\non\\
&\le& c_0f(x_0) + c_nf(x_n)\non\\
&=&c_0 f\left(\sum_{i=1}^{n-1}c_ix_i/c_0\right) + c_nf(x_n)\non\\
&\le& c_0 \frac{1}{c_0}\sum_{i=1}^{n-1}c_if(x_i) + c_n f(x_n)\non\\
&=&\sum_{i=1}^nc_if(x_i),\non
\end{eqnarray}\]
which is \(\eqref{eq:14-23a}\).
For concave function \(f\), we have
\[\begin{equation}
f\left(\sum_{i=1}^nc_ix_i\right) \ge \sum_{i=1}^nc_if(x_i).\non
\end{equation}\]
Therefore we know the claim in (a) is correct.
(b) We only need focus on the first term.
Note that
\[\begin{eqnarray}
g(\bW, \bb{H}|\bW^s, \bb{H}^s) &=& \sum_{i=1}^N\sum_{j=1}^p\sum_{k=1}^rx_{ij}\frac{a^s_{ikj}}{b^s_{ij}}\left(\log w_{ik} + \log h_{kj}\right)\non\\
&=&\sum_{i=1}^N\sum_{j=1}^p\sum_{k=1}^rx_{ij}\frac{a^s_{ikj}}{b^s_{ij}}\log \left(\frac{b^s_{ij}}{a^s_{ikj}}w_{ik}h_{kj}\right)- C,\non
\end{eqnarray}\]
where the constant \(C\) (independent of \(\bW\) and \(\bb{H}\)) is
\[\begin{equation}
C = \sum_{i=1}^N\sum_{j=1}^p\sum_{k=1}^rx_{ij}\frac{a^s_{ikj}}{b^s_{ij}}\log \frac{b^s_{ij}}{a^s_{ikj}}.\non
\end{equation}\]
Therefore, by (a), we have
\[\begin{equation}
g(\bW, \bb{H}|\bW^s, \bb{H}^s) \le L(\bb{W}, \bb{H}),\non
\end{equation}\]
and it's easy to verify that the equation holds when \(\bW^s = \bW\) and \(\bb{H}^s = \bb{H}\).
(c) Fix \(i\) and \(k\), we have
\[\begin{eqnarray}
\frac{\partial g(\bW, \bb{H}|\bW^s, \bb{H}^s)}{\partial w_{ik}} &=& \sum_{j=1}^p x_{ij}\frac{a^s_{ikj}}{b^s_{ij}}\frac{1}{w_{ik}} - \sum_{j=1}^ph_{kj}\non\\
&=&0,\non
\end{eqnarray}\]
so that
\[\begin{eqnarray}
w_{ik} &=& \frac{\sum_{j=1}^px_{ij}\frac{a^s_{ikj}}{b^s_{ij}}}{\sum_{j=1}^ph_{kj}}\non\\
&=&\left(\sum_{j=1}^px_{ij} \frac{w^s_{ik}h^s_{kj}}{\sum_{k'=1}^rw^s_{ik'}h^s_{k'j}}\right)\Bigg/\sum_{j=1}^ph_{kj}\non\\
&=&w_{ik}^s\frac{\sum_{j=1}^ph_{kj}x_{ij}/(\bW\bb{H})_{ij}}{\sum_{j=1}^ph_{kj}},\non
\end{eqnarray}\]
which is (14.74) in the text. The same arguments apply to the update of \(h_{kj}\), thus we omit it for brevity.