Ex. 17.12

Consider a Poisson regression problem with \(p\) binary variables \(x_{ij}, j = 1,...,p\) and response variable \(y_i\) which measures the number of observations with predictor \(x_i\in \{0,1\}^p\). The design is balanced, in that all \(n = 2p\) possible combinations are measured. We assume a log-linear model for the Poisson mean in each cell

\[\begin{equation} \log(\mu(X)) = \theta_{00} + \sum_{(i,j)\in E}x_{ij}x_{ik}\theta_{jk},\non \end{equation}\]

using the same notation as in Section 17.4.1 (including the constant variable \(x_{i0}=1\ \forall i\)). We assume the response is distributed as

\[\begin{equation} \text{Pr}(Y=y|X=x) = \frac{e^{-\mu(x)\mu(x)^y}}{y!}.\non \end{equation}\]

Write down the conditional log-likelihood for the observed responses \(y_i\), and compute the gradient.

(a) Show that the gradient equation for \(\theta_{00}\) computes the partition function (17.29).

(b) Show that the gradient equations for the remainder of the parameters are equivalent to the gradient (17.34).

Soln. 17.12

The conditional log-likelihood is

\[\begin{eqnarray} l(\bm{\Theta}) &=& \sum_{i=1}^N\log\frac{e^{-\mu(x_i)}\mu(x_i)^y}{y_i!}\non\\ &=&\sum_{i=1}^N\left[-\mu(x_i) + y_i\log(\mu(x_i))-\log(y_i!)\right].\non \end{eqnarray}\]

Note that

\[\begin{equation} \mu(x_i) = \exp\left(\theta_{00} + \sum_{(j,k)\in E}x_{ij}x_{ik}\theta_{jk}\right).\non \end{equation}\]

(a) We have

\[\begin{eqnarray} \frac{\partial l(\bm{\Theta})}{\partial \theta_{00}} &=& \sum_{i=1}^N(-\mu(x_i) + y_i),\non \end{eqnarray}\]

so that

\[\begin{eqnarray} \sum_{i=1}^Ny_i &=& \sum_{i=1}^N\exp\left(\theta_{00} + \sum_{(i,j)\in E}x_{ij}x_{ik}\theta_{jk}\right)\non\\ &=&\exp(\theta_{00})\sum_{i=1}^N\exp\left(\sum_{(i,j)\in E}x_{ij}x_{ik}\theta_{jk}\right).\non \end{eqnarray}\]

Solve for \(\theta_{00}\) we get

\[\begin{equation} \theta_{00} = \log\left(\sum_{i=1}^Ny_i\right) - \Phi(\bm{\Theta}).\non \end{equation}\]

Note that \(N=2^p\), so that \(\sum_{x\in \mathcal{X}}\) in (17.29) is the same as \(\sum_{i=1}^{N=2^p}\).

(b) From (a) we get the solution for \(\theta_{00}\) and thus know that

\[\begin{eqnarray} \mu(x_i) &=& \exp\left( \log\left(\sum_{l=1}^Ny_l\right) - \Phi(\bm{\Theta}) + \sum_{(j,k)\in E}x_{ij}x_{ik}\theta_{jk}\right)\non\\ &=&\left(\sum_{l=1}^Ny_l\right)p(x_i,\bm{\Theta}),\non \end{eqnarray}\]

where \(p(x,\bm{\Theta})\) is defined in (17.28).

Taking the derivative of \(l(\bm{\Theta})\) w.r.t to \(\theta_{jk}\) and setting it to be zero we get

\[\begin{eqnarray} 0&=&\sum_{i=1}^N(y_i-\mu(x_i))x_{ij}x_{ik}\non\\ &=&\sum_{i=1}^N\left(y_i-\left(\sum_{l=1}^Ny_l\right)p(x_i,\bm{\Theta})\right)x_{ij}x_{ik}.\non \end{eqnarray}\]

Therefore we have equivalent form of (17.34) in the text

\[\begin{equation} \hat E(X_jX_k) - E_{\bm{\Theta}}(X_jX_K)=0\non \end{equation}\]

where

\[\begin{equation} \hat E(X_jX_k) = \sum_{i=1}^N\frac{y_i}{\sum_{l=1}^Ny_l}x_{ij}x_{ik}\non \end{equation}\]

and as in (17.33)

\[\begin{equation} E_{\bm{\Theta}}(X_jX_K) = \sum_{i=1}^Nx_{ij}x_{ik}\cdot p(x_i, \bm{\Theta}).\non \end{equation}\]