Ex. 18.6

Ex. 18.6

Show how the theorem in Section 18.3.5 can be applied to regularized discriminant analysis [Section 4.14 and Equation (18.9)].

Soln. 18.6

Here we recap Section 4.5 in \cite{hastie2004efficient}, where the authors discussed applying the theorem to regularized discriminant analysis in detail.

Recall that LDA model (Section 4.3) assumes features come from a multivariate Gaussian distribution in each class \(k=1,...,K\). Each class has mean vector \(\mu_k\), but shares a common covariance matrix \(\Sigma\). The discriminant function of class \(k\) is

\[\begin{equation} \label{eq:18-6a} \delta_k(x) = x^T\Sigma^{-1}\mu_k - \frac{1}{2}\mu_k^T\Sigma^{-1}\mu_k + \log \pi_k \end{equation}\]

where \(\pi_k\) is the prior probability of class \(k\).

Parameters are estimated as

\[\begin{eqnarray} \hat\pi_k &=& \frac{n_k}{n}\non\\ \hat\mu_k &=& \frac{1}{n_k}\sum_{y_i=k}x_i\non\\ \hat\Sigma &=&\frac{1}{n-k}\sum_{k=1}^K\sum_{y_i=k}(x_i-\hat\mu_k)(x_i-\hat\mu_k)^T.\non \end{eqnarray}\]

These estimates are plugged into \(\eqref{eq:18-6a}\), which requires the inversion of a \(p\times p\) singular matrix \(\hat\Sigma\) when \(p\gg N\). Regularized overcomes the issue by replacing \(\hat\Sigma\) with

\[\begin{equation} \hat\Sigma(\gamma) = \gamma\hat\Sigma + (1-\gamma)\text{diag}(\hat\Sigma), \non \end{equation}\]

for \(\gamma \in [0,1]\).

Following the same arguments in Ex. 18.5, it's easy to show that \(\eqref{eq:18-6a}\) and its regularized version are invariant under a coordinate rotation. Hence we can once again use the SVD construction and replace the training \(x_i\) by their corresponding \(r_i\), and fit the regularized model in the lower-dimensional space. Again the \(n\)-dimensional linear coefficients

\[\begin{equation} \hat\beta_k^\ast = (\gamma\hat\Sigma^\ast + (1-\gamma)\text{diag}(\hat\Sigma^\ast))^{-1}\hat\mu_k^\ast\non \end{equation}\]

are mapped back to \(p\)-dimensions via \(\hat\beta_k = \bV\hat\beta_k^\ast\).