Ex. 4.3
Ex. 4.3
Suppose we transform the original predictors \(\bb{X}\) to \(\boldsymbol{\hat Y}\) via linear regression. In detail, let \(\boldsymbol{\hat Y} = \bb{X}(\bb{X}^T\bb{X})^{-1}\bb{X}^T\bb{Y} = \bb{X}\hat B\) where \(\bb{Y}\) is the indicator response matrix. Similarly for any input \(x\in\mathbb{R}^p\), we get a transformed vector \(\hat y = \hat B^Tx\in \mathbb{R}^K\). Show that LDA using \(\boldsymbol{\hat Y}\) is identical to LDA in the original space.
Soln. 4.3
We start by introducing notations used in Chapter 3.
Let \(x_i^T = (x_{i1}, ..., x_{ip})\in \mathbb{R}^{1\times p}\), \(\bb{1}^T = (1,...,1)\in \mathbb{R}^{1\times p}\), \(Y^T = (y_1, ..., y_N)\in \mathbb{R}^{1\times N}\), \(\beta^T = (\beta_{1}, ..., \beta_{p})\in \mathbb{R}^{1\times p}\). Let
and
We have
and \(\hat y = \hat B^Tx\) for a single training sample \(x\). We estimate the new parameters of the Gaussian distributions from transformed data, denoted by \(\pi_k^{\text{new}}\), \(\hat\mu_k^{\text{new}}\) and \(\hat\Sigma^{\text{new}}\), and link them back with \(\pi_k\), \(\hat\mu_k\) and \(\hat\Sigma\) estimated from original training data.
First, \(\pi_k^{\text{new}} = \pi_k\) for \(k=1,...,K\) since the training sample classification does not change.
Second, by definition of \(\hat\mu_k^{\text{new}}\), note again the training sample classification does not change, we have
Third, by definition of \(\hat\Sigma^{\text{new}}\) and result above, we have
Therefore, the new linear discriminant function is
which is identical to the discriminant function used in the original space.