Ex. 18.6
Ex. 18.6
Show how the theorem in Section 18.3.5 can be applied to regularized discriminant analysis [Section 4.14 and Equation (18.9)].
Soln. 18.6
Here we recap Section 4.5 in \cite{hastie2004efficient}, where the authors discussed applying the theorem to regularized discriminant analysis in detail.
Recall that LDA model (Section 4.3) assumes features come from a multivariate Gaussian distribution in each class \(k=1,...,K\). Each class has mean vector \(\mu_k\), but shares a common covariance matrix \(\Sigma\). The discriminant function of class \(k\) is
where \(\pi_k\) is the prior probability of class \(k\).
Parameters are estimated as
These estimates are plugged into \(\eqref{eq:18-6a}\), which requires the inversion of a \(p\times p\) singular matrix \(\hat\Sigma\) when \(p\gg N\). Regularized overcomes the issue by replacing \(\hat\Sigma\) with
for \(\gamma \in [0,1]\).
Following the same arguments in Ex. 18.5, it's easy to show that \(\eqref{eq:18-6a}\) and its regularized version are invariant under a coordinate rotation. Hence we can once again use the SVD construction and replace the training \(x_i\) by their corresponding \(r_i\), and fit the regularized model in the lower-dimensional space. Again the \(n\)-dimensional linear coefficients
are mapped back to \(p\)-dimensions via \(\hat\beta_k = \bV\hat\beta_k^\ast\).