Ex. 18.3
Ex. 18.3
Show that the fitted coefficients for the regularized multiclass logistic regression problem (18.10) satisfy \(\sum_{k=1}^K\hat\beta_{kj}=0, j=1,...,p.\) What about the \(\hat\beta_{k0}\)? Discuss issues with these constant parameters, and how they can be resolved.
Soln. 18.3
The objective function can be written as
Taking first-order derivative w.r.t \(\beta_k\) and setting it to zero, we get
By the fact that \(\sum_{k=1}^K\frac{\partial L(\bm{\beta})}{\partial \beta_k} = 0\), it's easy to see that \(\eqref{eq:18-3a}\) leads to
For constant parameters \(\hat\beta_{k0}\), they are not differentiable, in the sense that if we add a common constant \(\alpha\) to each of \(\hat\beta_{k0}\), then the derived probabilities are not changed. Therefore, we need to impose an additional regularization for \(\hat\beta_{k0}\), e.g,