Ex. 13.1

Consider a Gaussian mixture model where the covariance matrices are assumed to be scalar: \(\Sigma_r =\sigma\bb{I} \ \forall r=1,...,R\), and \(\sigma\) is a fixed parameter. Discuss the analogy between the \(K\)-means clustering algorithm and the EM algorithm for fitting this mixture model in detail. Show that in the limit \(\sigma\ra0\) the two methods coincide.

Soln. 13.1

The analogy between EM algorithm (see Section 8.5 for details) and \(K\)-means algorithm is summarized in Section 13.2.3. In short, EM algorithm can be regarded as a soft version of \(K\)-means for Gaussian mixture model. The EM algorithm uses responsibilities to make a soft assignment of each data point to one of the clusters. When \(\sigma\) is fixed, responsibility of data point \(i\) assigning to cluster \(k\) is given by

\[\begin{equation} r_i^{(k)} = \frac{\exp(-\frac{1}{2\sigma^2}\|y_i - \mu_k\|^2)}{\sum_{l=1}^K\exp(-\frac{1}{2\sigma^2}\|y_i-\mu_l\|^2)}.\non \end{equation}\]

It's easy to very that, as \(\sigma\ra 0\), \(r_i^{(k)}\ra 1\) for the cluster \(k\) that is closest to \(y_i\) and \(r_i^{(k)}\ra 0\) for other clusters. Therefore \(K\)-means algorithm is recovered as \(\sigma \ra 0\).