Ex. 17.12
Ex. 17.12
Consider a Poisson regression problem with \(p\) binary variables \(x_{ij}, j = 1,...,p\) and response variable \(y_i\) which measures the number of observations with predictor \(x_i\in \{0,1\}^p\). The design is balanced, in that all \(n = 2p\) possible combinations are measured. We assume a log-linear model for the Poisson mean in each cell
\[\begin{equation}
\log(\mu(X)) = \theta_{00} + \sum_{(i,j)\in E}x_{ij}x_{ik}\theta_{jk},\non
\end{equation}\]
using the same notation as in Section 17.4.1 (including the constant variable \(x_{i0}=1\ \forall i\)). We assume the response is distributed as
\[\begin{equation}
\text{Pr}(Y=y|X=x) = \frac{e^{-\mu(x)\mu(x)^y}}{y!}.\non
\end{equation}\]
Write down the conditional log-likelihood for the observed responses \(y_i\), and compute the gradient.
(a) Show that the gradient equation for \(\theta_{00}\) computes the partition function (17.29).
(b) Show that the gradient equations for the remainder of the parameters are equivalent to the gradient (17.34).
Soln. 17.12
The conditional log-likelihood is
\[\begin{eqnarray}
l(\bm{\Theta}) &=& \sum_{i=1}^N\log\frac{e^{-\mu(x_i)}\mu(x_i)^y}{y_i!}\non\\
&=&\sum_{i=1}^N\left[-\mu(x_i) + y_i\log(\mu(x_i))-\log(y_i!)\right].\non
\end{eqnarray}\]
Note that
\[\begin{equation}
\mu(x_i) = \exp\left(\theta_{00} + \sum_{(j,k)\in E}x_{ij}x_{ik}\theta_{jk}\right).\non
\end{equation}\]
(a) We have
\[\begin{eqnarray}
\frac{\partial l(\bm{\Theta})}{\partial \theta_{00}} &=& \sum_{i=1}^N(-\mu(x_i) + y_i),\non
\end{eqnarray}\]
so that
\[\begin{eqnarray}
\sum_{i=1}^Ny_i &=& \sum_{i=1}^N\exp\left(\theta_{00} + \sum_{(i,j)\in E}x_{ij}x_{ik}\theta_{jk}\right)\non\\
&=&\exp(\theta_{00})\sum_{i=1}^N\exp\left(\sum_{(i,j)\in E}x_{ij}x_{ik}\theta_{jk}\right).\non
\end{eqnarray}\]
Solve for \(\theta_{00}\) we get
\[\begin{equation}
\theta_{00} = \log\left(\sum_{i=1}^Ny_i\right) - \Phi(\bm{\Theta}).\non
\end{equation}\]
Note that \(N=2^p\), so that \(\sum_{x\in \mathcal{X}}\) in (17.29) is the same as \(\sum_{i=1}^{N=2^p}\).
(b) From (a) we get the solution for \(\theta_{00}\) and thus know that
\[\begin{eqnarray}
\mu(x_i) &=& \exp\left( \log\left(\sum_{l=1}^Ny_l\right) - \Phi(\bm{\Theta}) + \sum_{(j,k)\in E}x_{ij}x_{ik}\theta_{jk}\right)\non\\
&=&\left(\sum_{l=1}^Ny_l\right)p(x_i,\bm{\Theta}),\non
\end{eqnarray}\]
where \(p(x,\bm{\Theta})\) is defined in (17.28).
Taking the derivative of \(l(\bm{\Theta})\) w.r.t to \(\theta_{jk}\) and setting it to be zero we get
\[\begin{eqnarray}
0&=&\sum_{i=1}^N(y_i-\mu(x_i))x_{ij}x_{ik}\non\\
&=&\sum_{i=1}^N\left(y_i-\left(\sum_{l=1}^Ny_l\right)p(x_i,\bm{\Theta})\right)x_{ij}x_{ik}.\non
\end{eqnarray}\]
Therefore we have equivalent form of (17.34) in the text
\[\begin{equation}
\hat E(X_jX_k) - E_{\bm{\Theta}}(X_jX_K)=0\non
\end{equation}\]
where
\[\begin{equation}
\hat E(X_jX_k) = \sum_{i=1}^N\frac{y_i}{\sum_{l=1}^Ny_l}x_{ij}x_{ik}\non
\end{equation}\]
and as in (17.33)
\[\begin{equation}
E_{\bm{\Theta}}(X_jX_K) = \sum_{i=1}^Nx_{ij}x_{ik}\cdot p(x_i, \bm{\Theta}).\non
\end{equation}\]