Ex. 14.2
Ex. 14.2
Consider a mixture model density in \(p\)-dimensional feature space,
where \(g_k=N(\mu_k, \bb{I}\cdot \sigma^2)\) and \(\pi_k\ge 0 \ \forall k\) with \(\sum_{k}\pi_k = 1\). Here \(\{\mu_k, \pi_k\}, k=1,...,K\) and \(\sigma^2\) are unknown parameters.
Suppose we have data \(x_1, x_2, ..., x_N\sim g(x)\) and we wish to fit the mixture model.
-
Write down the log-likelihood of the data.
-
Derive an EM algorithm for computing the maximum likelihood estimates (see Section 8.1)
-
Show that if \(\sigma\) has a known value in the mixture model and we take \(\sigma\ra 0\), then in a sense this EM algorithm coincides with \(K\)-means clustering.
Soln. 14.2
This is similar to Ex. 13.1.
(1) The log-likelihood of the data is
(2) The EM algorithm is an iterate-until-convergence process, and each iteration consists of two steps, Expectation and Maximization. For Gaussian mixture model, the Expectation step computes the responsibilities of \(x_i\) for class \(k\) as
where \(\hat g_j(x) = N(\hat \mu_j, \bb{I}\cdot \hat\sigma^2)\).
The Maximization step then computes the weighted means and variances as
Note that for the general case where the variance \(\Sigma_k\) may vary for each \(k\), the estimation above becomes
(3) It's easy to very that, as \(\sigma\ra 0\), \(r_i^{(k)}\ra 1\) for the cluster \(k\) that is closest to \(x_i\) and \(r_i^{(k)}\ra 0\) for other clusters. Therefore \(K\)-means algorithm is recovered.