Ex. 6.4

Suppose that the \(p\) predictors \(X\) arise from sampling relatively smooth analog curves at \(p\) uniformly spaced abscissa values. Denote by \(\text{Cov}(X|Y) = \bm{\Sigma}\) the conditional covariance matrix of the predictors, and assume this does not change much with \(Y\). Discuss the nature of Mahalanobis choice \(\bb{A} = \bm{\Sigma}^{-1}\) for the metric in (6.14). How does this compare with \(\bb{A} = \bb{I}\)? How might you construct a kernel \(\bb{A}\) that (a) downweights high-frequency components in the distance metric; (b) ignores them completely?

Soln. 6.4

If \(\bb{A} = \bb{I}\), then Mahalanobis distance \(d = \sqrt{(x-x_0)^T\bm{\Sigma}^{-1}(x-x_0)}\) reduces to Euclidean distance. We first standardize each variable to unit standard deviation

\[\begin{equation} x'_i = \frac{x_i- E[x_i]}{\sqrt{\text{Var}(x_i)}}.\non \end{equation}\]

Then we have

\[\begin{eqnarray} \text{Cov}(x'_i, x'_j) &=& E[x'_ix'_j]\non\\ &=&\frac{E[(x_i-E[x_i])(x_j-E[x_j])]}{\sqrt{\text{Var}(x_i)}\sqrt{\text{Var}(x_j)}}\non\\ &=&\frac{\Sigma_{ij}}{\sqrt{\text{Var}(x_i)}\sqrt{\text{Var}(x_j)}}\non\\ &=&\rho(x_i, x_j).\non \end{eqnarray}\]

We see that after standardizing variables the new covariance matrix is exactly the correlation matrix.

(a) In order to downweights high-frequency components, we can decrease \(\rho(x_i, x_j)\);

(b) In order to ignore them completely, we can set \(\rho(x_i, x_j)\) to be zero.