Ex. 6.4
Ex. 6.4
Suppose that the \(p\) predictors \(X\) arise from sampling relatively smooth analog curves at \(p\) uniformly spaced abscissa values. Denote by \(\text{Cov}(X|Y) = \bm{\Sigma}\) the conditional covariance matrix of the predictors, and assume this does not change much with \(Y\). Discuss the nature of Mahalanobis choice \(\bb{A} = \bm{\Sigma}^{-1}\) for the metric in (6.14). How does this compare with \(\bb{A} = \bb{I}\)? How might you construct a kernel \(\bb{A}\) that (a) downweights high-frequency components in the distance metric; (b) ignores them completely?
Soln. 6.4
If \(\bb{A} = \bb{I}\), then Mahalanobis distance \(d = \sqrt{(x-x_0)^T\bm{\Sigma}^{-1}(x-x_0)}\) reduces to Euclidean distance. We first standardize each variable to unit standard deviation
Then we have
We see that after standardizing variables the new covariance matrix is exactly the correlation matrix.
(a) In order to downweights high-frequency components, we can decrease \(\rho(x_i, x_j)\);
(b) In order to ignore them completely, we can set \(\rho(x_i, x_j)\) to be zero.