Ex. 18.8
Ex. 18.8
Data Piling. Exercise 4.2 shows that the two-class LDA solution can be obtained by a linear regression of a binary response vector \(\by\) consisting of -1s and +1s. The prediction \(\hat\beta^Tx\) for any \(x\) is (up to a scale and shift) the LDA score \(\delta(x)\). Suppose now that \(p\gg N\).
(a) Consider the linear regression model \(f(x)=\alpha + \beta^Tx\) fit to a binary response \(Y\in \{-1, +1\}\). Using Exercise 18.7, show that there are infinitely many directions defined by \(\hat\beta\) in \(\mathbb{R}^p\) onto which the data project to exactly two points, one for each class. There are known as data piling directions (Ahn and Marron, 2005)
(b) Show that the distance between the projected points is \(2/\|\hat\beta\|\), and hence these directions define separating hyperplanes with that margin.
(c) Argue that there is a single maximal data piling direction for which this distance is largest, and is defined by \(\hat\beta_0=\bV\bD^{-1}\bU^T\by=\bX^{-}\by\), where \(\bX=\bU\bD\bV^T\) is the SVD of \(\bX\).
Soln. 18.8
(a) This follows from Ex. 18.7 (a) directly.
(b) Suppose that \(x_1\) and \(x_2\) are the closest point from each class, that is,
Then the result follows directly from (4.40) in Section 4.5 (Separating Hyperplanes).
(c) This follows from Ex. 18.7 (c), where we proved that \(\hat\beta_0\) has smallest Euclidean norm among all zero-residual solutions, thus largest distance (margin) by (b).