Ex. 2.4

The edge effect problem discussed on page 23 is not peculiar to uniform sampling from bounded domains. Consider inputs drawn from a spherical multinormal distribution \(X\sim N(0, \textbf{I}_p)\). The squared distance from any sample point to the origin has a \(\chi_p^2\) distribution with mean \(p\). Consider a prediction point \(x_0\) drawn from this distribution, and let \(a=x_0/\|x_0\|\) be an associated unit vector. Let \(z_i=a^Tx_i\) be the projection of each of the training points on this direction.

Show that the \(z_i\) are distributed \(N(0,1)\) with expected squared distance from origin 1, while the target point has expected squared distance \(p\) from the origin. Hence for \(p=10\), a randomly drawn test point is about 3.1 standard deviations from the origin, while all the training points are on average one standard deviation along direction \(a\). So most prediction points see themselves as lying on the edge of the training set.

Soln. 2.4

Since \(z_i = a^Tx_i\), \(z_i\) is a linear combination of standard normal random variables, thus \(z_i\) is normal. It's easy to see that \(E[z_i] = 0\) and

\[\begin{equation} \text{Var}(z_i) = \|a\|^2\text{Var}(x_i) = \text{Var}(x_i) = 1.\nonumber \end{equation}\]

There we know \(z_i\sim N(0,1)\) and the expected squared distance from origin is just its variance, which is 1. As for the target point \(x_t\), its squared distance to origin follows a \(\chi_p^2\) distribution and thus has mean \(p\).

For \(p=10\), we have

\[\begin{equation} \text{sd}(x_t) = \sqrt{\text{Var}(x_t)} = \sqrt{10} \approx 3.16.\nonumber \end{equation}\]