Ex. 2.4

Ex. 2.4

The edge effect problem discussed on page 23 is not peculiar to uniform sampling from bounded domains. Consider inputs drawn from a spherical multinormal distribution XN(0,Ip). The squared distance from any sample point to the origin has a χp2 distribution with mean p. Consider a prediction point x0 drawn from this distribution, and let a=x0/x0 be an associated unit vector. Let zi=aTxi be the projection of each of the training points on this direction.

Show that the zi are distributed N(0,1) with expected squared distance from origin 1, while the target point has expected squared distance p from the origin. Hence for p=10, a randomly drawn test point is about 3.1 standard deviations from the origin, while all the training points are on average one standard deviation along direction a. So most prediction points see themselves as lying on the edge of the training set.

Soln. 2.4

Since zi=aTxi, zi is a linear combination of standard normal random variables, thus zi is normal. It's easy to see that E[zi]=0 and

Var(zi)=a2Var(xi)=Var(xi)=1.

There we know ziN(0,1) and the expected squared distance from origin is just its variance, which is 1. As for the target point xt, its squared distance to origin follows a χp2 distribution and thus has mean p.

For p=10, we have

sd(xt)=Var(xt)=103.16.