Ex. 4.6
Ex. 4.6
Suppose we have \(N\) points \(x_i\) in \(\mathbb{R}^p\) in general position, with class labels \(g_i\in\{-1,1\}\). Prove that the perceptron learning algorithm converges to a separating hyperplane in a finite number of steps:
- (a) Denote a hyperplane by \(f(x) = \beta_1^Tx + \beta_0 = 0\) or in a more compact notation \(\beta^Tx^\ast=0\), where \(x^\ast = (x,1)\) and \(\beta=(\beta_1, \beta_0)\). Let \(z_i = x_i^\ast/\|x_i^\ast\|\). Show that separability implies the existence of a \(\beta_{\text{sep}}\) such that \(y_i\beta^T_{\text{sep}}z_i\ge 1\ \ \forall i\).
- (b) Given a current \(\beta_{\text{old}}\), the perceptron algorithm identifies a point \(z_i\) that is misclassified, and produces the update \(\beta_{\text{new}}\leftarrow \beta_{\text{old}} + y_iz_i\). Show that \(\|\beta_{\text{new}}-\beta_{\text{sep}}\|^2\le \|\beta_{\text{old}}-\beta_{\text{sep}}\|^2 - 1\), and hence that the algorithm converges to a separating hyperplane in no more than \(\|\beta_{\text{start}}-\beta_{\text{sep}}\|^2\) steps (Pattern Recognition and Neural Networks).
Soln. 4.6
- (a) By definition of separability, there exists \(\beta\) such that
\[\begin{eqnarray}
\beta^Tx_i > &0& \text{for}\ \ y_i = 1\non\\
\beta^Tx_i < &0& \text{for}\ \ y_i = -1.\non
\end{eqnarray}\]
Thus we have \(y_i\beta^Tx_i > 0\) for all \(x_i\), thus for \(y_i\beta^Tz_i > 0\) for all \(z_i\).
Define
\[\begin{equation}
m := \min_{i} \|y_i\beta^Tz_i\|.\non
\end{equation}\]
Thus, \(y_i(\frac{1}{m}\beta^T)z_i\ge 1\). So there exists a \(\beta_{\text{sep}} := \frac{1}{m}\beta\) such that \(y_i\beta^T_{\text{sep}}z_i\ge 1 \ \forall i\).
- (b) We have
\[\begin{eqnarray}
\|\beta_{\text{new}} - \beta_{\text{sep}}\|^2 &=& \|\beta_{\text{old}} - \beta_{\text{sep}} + y_iz_i\|^2\non\\
&=&\|\beta_{\text{old}} - \beta_{\text{sep}}\|^2 + \|y_iz_i\|^2 + 2y_i(\beta_{\text{old}} - \beta_{\text{sep}})^Tz_i\non\\
&=&\|\beta_{\text{old}} - \beta_{\text{sep}}\|^2 + 1 + 2y_i\beta_{\text{old}}^Tz_i - 2y_i\beta_{\text{sep}}^Tz_i\non\\
&\le&\|\beta_{\text{old}} - \beta_{\text{sep}}\|^2 + 1 + 2\cdot 0 - 2 \cdot 1 \non\\
&=&\|\beta_{\text{old}} - \beta_{\text{sep}}\|^2-1.\non
\end{eqnarray}\]