Ex. 18.10

Ex. 18.10

When \(p\gg N\), linear discriminant analysis (see Section 4.3) is degenerate because the within-class covariance matrix \(\bb{W}\) is singular. One version of regularized discriminant analysis (4.14) replaces \(\bW\) by a ridged version \(\bW + \lambda \bI\). leading to a regularized discriminant function \(\delta_\lambda(x)=x^T(\bW + \lambda\bI)^{-1}(\bar x_1 - \bar x_{-1})\). Show that \(\delta_0(x)=\lim_{\lambda\downarrow 0}\delta_\lambda(x)\) corresponds to the maximal data piling direction defined in Exercise 18.8.

Soln. 18.10

Note we need to assume that \(\bb{X}\) is centered.

By Ex. 18.8 (c) and Ex. 18.7 (b), the maximal data piling direction \(\hat\beta= \bb{V}(\bb{R}^T\bb{R})^{-1}\bb{R}^T\by\), which further equals to \((\bb{X}^T\bb{X})^{-1}\bb{X}^T\by\) by Ex. 18.4. See (18.14)-(18.15) in the text for related discussions.

Note that in the binary case, where \(y=1\) or -1, \(\bX^T\by\) has the same direction as \((\bar x_1-\bar x_{-1})\). With \(\bb{W} = \bb{X}\bb{X}^T\), it suffices to show that

\[\begin{equation} \lim_{\delta\ra 0}(\bb{W}+\lambda \bb{I})^{-1} = \bb{W}^{-1} = (\bb{X}\bb{X}^T)^{-1}.\non \end{equation}\]

To see that, we start with the spectral decomposition of \(\bb{W}\)

\[\begin{equation} \bb{W}^{-1} = \sum_{i=1}^p\frac{\bb{v}_i\bb{v}_i^T}{\bb{e}_i}\non \end{equation}\]

where \(\bb{v}_i\) is the \(i\)-th eigenvector of \(\bb{W}\) and \(\bb{e}_i\) is its corresponding eigenvalue. Then

\[\begin{equation} (\bb{W}+\lambda \bb{I})^{-1} = \sum_{i=1}^p\frac{\bb{v}_i\bb{v}_i^T}{\bb{e}_i+\lambda}.\non \end{equation}\]

It's now easy to see the proof is complete.