Ex. 18.11

Ex. 18.11

Suppose you have a sample of \(N\) pairs \((x_i, y_i)\), with \(y_i\) binary and \(x_i\in \mathbb{R}^1\). Suppose also that the two classes are separable; e.g., for each pair \(i, i'\) with \(y_i=0\) and \(y_{i'}=1\), \(x_{i'}-x_i\ge C\) for some \(C > 0\). You wish to fit a linear logistic regression model logitPr\((Y=1|X)=\alpha + \beta X\) by maximum-likelihood. Show that \(\hat\beta\) is undefined.

Soln. 18.11

Linear logistic regression is discussed in Section 4.4. Recall (4.20) in the text, the likelihood we need to maximize is

\[\begin{eqnarray} l(\alpha, \beta) &=& \sum_{i=1}^N\{y_i(\alpha+\beta x_i)-\log(1+\exp(\alpha + \beta x_i))\}\non\\ &=&\sum_{i:y_i = 1}(\alpha+\beta x_i - \log(1+\exp(\alpha + \beta x_i) ) - \sum_{i: y_i=0}\log(1+\exp(\alpha + \beta x_i).\non \end{eqnarray}\]

Without loss of generality, we assume that \(x_i\ge C\) for when \(y_i=1\) and \(x_i < 0\) when \(y_i=0\). This can be achieved by shifting the constant \(\alpha\). Consider the first summation above, take the derivative with respect to \(\beta\), we get

\[\begin{equation} \sum_{i:y_i=1} \frac{x_i}{1+e^{\alpha+\beta x_i}} > 0.\non \end{equation}\]

Therefore the first summation is increasing in \(\beta\). Similarly, it's easy to verify that the second summation is decreasing in \(\beta\). Therefore the likelihood is increasing in \(\beta\), so the optimal \(\hat\beta\) is undefined.