Ex. 6.8
Ex. 6.8
Suppose that for continuous response \(Y\) and predictor \(X\), we model the joint density of \(X, Y\) using a multivariate Gaussian kernel estimator. Note that the kernel in this case would be the product kernel \(\phi_\lambda(X)\phi_\lambda(Y)\). Show that the conditional mean \(E[Y|X]\) derived from this estimate is a Nadaraya-Watson estimator. Extend this result to classification by providing a suitable kernel for the estimation of the joint distribution of a continuous \(X\) and discrete \(Y\).
Soln. 6.8
By definition we get
The estimates give (see (6.23) in the text)
Thus we have
where the last equations follows from
Now consider the case when \(Y\) is discrete. Assume that \(Y\) takes values in the set \(J\subset Z=\{\cdots, -1, 0, 1, \cdots\}\). If we choose a naive frequency estimate for \(Y\), then we have
where \(C(i)\) is the set of \(X\)'s such that the corresponding \(Y\)'s are in category \(i\) and \(N_i\) is the size of \(C(i)\). Then it's easy to verify that \(\eqref{eq:6-8a}\) holds.
Such estimate can be viewed as the combined kernel for the continuous component with the frequency for the discrete. We can further improve it by a smoothing over the estimate with respect to the discrete component \(Y\) using a discrete window weight function (see, e.g., Nonparametric estimation of joint discrete-continuous probability densities with applications).