Ex. 2.2

Show how to compute the Bayes decision boundary for the simulation example in Figure 2.5.

Soln. 2.2

Let's first recall how the data is generated (starting from the bottom of page 16 in the text).

First we generated 10 means \(m_k\) from a bivariate Gaussian \(N((1,0)^T, \textbf{I})\) and labeled this class BLUE. Similarly we generate 10 more means, denoted as \(o_k\), from \(N((0,1)^T, \textbf{I})\) and labeled this class ORANGE. We regard \(m_k\) and \(o_k\) as fixed for this problem.

Next, for each color (class), we generate 100 observations in the following way. For each observation, we picked an \(m_k\) (\(o_k\), respectively) at random with probability \(1/10\), and generate a variable with distribution \(N(m_k, \textbf{I}/5)\) (\(N(o_k, \textbf{I}/5)\), respectively), thus leading to a mixture of Gaussian clusters for each class.

Therefore we have

\[\begin{equation} \label{eq:2-2blue} P(X=x|\text{BLUE}) = \sum_{i=1}^{10}\frac{1}{10}\phi(x;m_i, \textbf{I}/5)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \end{equation}\]

where \(\phi\) is the density of \(N(m_i, \textbf{I}/5)\).

Similarly

\[\begin{equation} \label{eq:2-2organge} P(X=x|\text{ORANGE}) = \sum_{i=1}^{10}\frac{1}{10}\phi(x;o_i, \textbf{I}/5)\ \ \ \ \ \ \ \ \ \ \ \ \end{equation}\]

The Bayes decision boundary is determined by

\[\begin{equation} \label{eq:2-2bound} P\left(\text{BLUE}|X=x\right) = P\left(\text{ORANGE}|X=x\right)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \end{equation}\]

From Bayes formula we have

\[\begin{eqnarray} P\left(\text{BLUE}|X=x\right) &=& \frac{P\left(\text{BLUE}, X=x\right)}{P(X=x)}\nonumber\\ &=&\frac{P(X=x|\text{BLUE})P(\text{BLUE})}{P(X=x)}.\nonumber \end{eqnarray}\]

Similarly we have

\[\begin{eqnarray} P\left(\text{ORANGE}|X=x\right) &=&\frac{P(X=x|\text{ORANGE})P(\text{ORANGE})}{P(X=x)}.\nonumber \end{eqnarray}\]

There the boundary equation \(\eqref{eq:2-2bound}\) reduces to

\[\begin{equation} P(X=x|\text{BLUE})P(\text{BLUE}) = P(X=x|\text{ORANGE})P(\text{ORANGE}).\nonumber \end{equation}\]

Note that in this example \(P(\text{BLUE}) = P(\text{ORANGE}) = 1/2\), so we have

\[\begin{equation} P(X=x|\text{BLUE}) = P(X=x|\text{ORANGE}).\nonumber \end{equation}\]

Recall \(\eqref{eq:2-2blue}\) - \(\eqref{eq:2-2organge}\), it's now easy to see how to calculate the Bayes decision boundary.