Ex. 4.5
Ex. 4.5
Consider a two-class logistic regression problem with \(x\in\mathbb{R}\). Characterize the maximum-likelihood estimates of the slope and intercept parameter if the sample \(x_i\) for two classes are separated by a point \(x_0\in\mathbb{R}\). Generalize this result to (a) \(x\in \mathbb{R}^p\) (see Figure 4.16 in the textbook), and (b) more than two classes.
Soln. 4.5
Naturally we label \(y_i=1\) for those \(x_i > x_0\) and \(y_i=0\) for those \(x_i < x_0\). The log-likelihood for \(N\) observations is
By choosing \(\beta_0 = -\beta_1 x_0\), the equation above is simplified as
It's easy to verify that the first term above vanishes while the second term above goes to infinity when \(\beta_1\ra\infty\), thus the maximum of the log-likelihood will never be achieved.
- (a) When \(x\in\mathbb{R}^p\) with \(p > 1\), there exists two subsets of \(\{x_1,...,x_N\}\), \(S_1\) and \(S_2\), such that \(S_1\cup S_2=\{x_1,...,x_N\}\), \(S_1\cap S_2 = \emptyset\). Further more, there exists a hyperplane \(\hat\beta^T x = 0\) in \(\mathbb{R}^{p+1}\) and
We label \(y_i=1\) for \(x_i\in S_1\) and \(y_i=0\) for \(x_i\in S_2\). The log-likelihood becomes
Like the case when \(p=1\), if we keep updating \(\beta\) by \(\beta^{\text{new}}\leftarrow \alpha\beta^{\text{old}}\) with \(\alpha\ra+\infty\), the log-likelihood \(l(\beta)\ra+\infty\).
- (b) For the case where there are \(K>2\) classes in \(\mathbb{R}^p\) with \(p>1\), the same arguments follow. In particular, the log-likelihood becomes
There exists \(\beta_k\) for \(k=1,...,K-1\) such that \(\beta^T_kx > 0\) for \(x\in S_k\), therefore, by similar arguments of increasing \(\beta_k\) in (a), the log-likelihood \(l(\beta)\ra+\infty\).