A Solution Manual for ESL

Ex. 4.7

YuhangZhou88/ESL_Solution

A Solution Manual for ESL

YuhangZhou88/ESL_Solution

Home
ESL Solution
ESL Solution
- 2 Overview of Supervised Learning
  2 Overview of Supervised Learning
  - Ex. 2.1
  - Ex. 2.2
  - Ex. 2.3
  - Ex. 2.4
  - Ex. 2.5
  - Ex. 2.6
  - Ex. 2.7
  - Ex. 2.8
  - Ex. 2.9
- 3 Linear Methods for Regression
  3 Linear Methods for Regression
  - Ex. 3.1
  - Ex. 3.2
  - Ex. 3.3
  - Ex. 3.4
  - Ex. 3.5
  - Ex. 3.6
  - Ex. 3.7
  - Ex. 3.8
  - Ex. 3.9
  - Ex. 3.10
  - Ex. 3.11
  - Ex. 3.12
  - Ex. 3.13
  - Ex. 3.14
  - Ex. 3.15
  - Ex. 3.16
  - Ex. 3.17
  - Ex. 3.18
  - Ex. 3.19
  - Ex. 3.20
  - Ex. 3.21
  - Ex. 3.22
  - Ex. 3.23
  - Ex. 3.24
  - Ex. 3.25
  - Ex. 3.26
  - Ex. 3.27
  - Ex. 3.28
  - Ex. 3.29
  - Ex. 3.30
- 4 Linear Methods for Classification
  4 Linear Methods for Classification
  - Ex. 4.1
  - Ex. 4.2
  - Ex. 4.3
  - Ex. 4.4
  - Ex. 4.5
  - Ex. 4.6
  - Ex. 4.7
  - Ex. 4.8
  - Ex. 4.9
- 5 Basis Expansions and Regularization
  5 Basis Expansions and Regularization
  - Ex. 5.1
  - Ex. 5.2
  - Ex. 5.3
  - Ex. 5.4
  - Ex. 5.5
  - Ex. 5.6
  - Ex. 5.7
  - Ex. 5.8
  - Ex. 5.9
  - Ex. 5.10
  - Ex. 5.11
  - Ex. 5.12
  - Ex. 5.13
  - Ex. 5.14
  - Ex. 5.15
  - Ex. 5.16
  - Ex. 5.17
  - Ex. 5.18
  - Ex. 5.19
- 6 Kernel Smoothing Methods
  6 Kernel Smoothing Methods
  - Ex. 6.1
  - Ex. 6.2
  - Ex. 6.3
  - Ex. 6.4
  - Ex. 6.5
  - Ex. 6.6
  - Ex. 6.7
  - Ex. 6.8
  - Ex. 6.9
  - Ex. 6.10
  - Ex. 6.11
  - Ex. 6.12
- 7 Model Assessment and Selection
  7 Model Assessment and Selection
  - Ex. 7.1
  - Ex. 7.2
  - Ex. 7.3
  - Ex. 7.4
  - Ex. 7.5
  - Ex. 7.6
  - Ex. 7.7
  - Ex. 7.8
  - Ex. 7.9
  - Ex. 7.10 (TODO)
- 8 Model Inference and Averaging
  8 Model Inference and Averaging
  - Ex. 8.1
  - Ex. 8.2
  - Ex. 8.3
  - Ex. 8.4
  - Ex. 8.5
  - Ex. 8.6
  - Ex. 8.7
- 9 Additive Models and Trees
  9 Additive Models and Trees
  - Ex. 9.1
  - Ex. 9.2
  - Ex. 9.3
  - Ex. 9.4
  - Ex. 9.5
  - Ex. 9.6
- 10 Boosting and Additive Trees
  10 Boosting and Additive Trees
  - Ex. 10.1
  - Ex. 10.2
  - Ex. 10.3
  - Ex. 10.4
  - Ex. 10.5
  - Ex. 10.6
  - Ex. 10.7
  - Ex. 10.8
  - Ex. 10.9
  - Ex. 10.10
  - Ex. 10.11
  - Ex. 10.12
- 11 Neural Networks
  11 Neural Networks
- 12 Flexible Discriminants
  12 Flexible Discriminants
  - Ex. 12.1
  - Ex. 12.2
  - Ex. 12.3
  - Ex. 12.4
  - Ex. 12.5 (TODO)
  - Ex. 12.6
  - Ex. 12.7
  - Ex. 12.8
  - Ex. 12.9
  - Ex. 12.10
  - Ex. 12.11
- 13 Prototypes and Nearest Neighbors
  13 Prototypes and Nearest Neighbors
  - Ex. 13.1
  - Ex. 13.2
  - Ex. 13.3
  - Ex. 13.4
  - Ex. 13.5
  - Ex. 13.6
  - Ex. 13.7
  - Ex. 13.8 (TODO)
- 14 Unsupervised Learning
  14 Unsupervised Learning
  - Ex. 14.1
  - Ex. 14.2
  - Ex. 14.7
  - Ex. 14.8
  - Ex. 14.10
  - Ex. 14.11
  - Ex. 14.18
  - Ex. 14.19
  - Ex. 14.20
  - Ex. 14.23
  - Ex. 14.24
- 15 Random Forests
  15 Random Forests
  - Ex. 15.1
  - Ex. 15.2 (TODO)
  - Ex. 15.3
  - Ex. 15.4
  - Ex. 15.5
  - Ex. 15.6
  - Ex. 15.7
- 16 Ensemble Learning
  16 Ensemble Learning
- 17 Undirected Graphical Models
  17 Undirected Graphical Models
  - Ex. 17.1
  - Ex. 17.2
  - Ex. 17.3
  - Ex. 17.4
  - Ex. 17.5
  - Ex. 17.6
  - Ex. 17.7
  - Ex. 17.8
  - Ex. 17.9
  - Ex. 17.10
  - Ex. 17.11
  - Ex. 17.12
- 18 High Dimensional Problems
  18 High Dimensional Problems
  - Ex. 18.1
  - Ex. 18.2
  - Ex. 18.3
  - Ex. 18.4
  - Ex. 18.5
  - Ex. 18.6
  - Ex. 18.7
  - Ex. 18.8
  - Ex. 18.9
  - Ex. 18.10
  - Ex. 18.11
  - Ex. 18.12
  - Ex. 18.13
  - Ex. 18.14
  - Ex. 18.15
  - Ex. 18.16
  - Ex. 18.17
  - Ex. 18.18
  - Ex. 18.19
  - Ex. 18.20

Ex. 4.7

Ex. 4.7

Consider the criterion

\[\begin{equation} D^\ast(\beta,\beta_0) = -\sum_{i=1}^Ny_i(x_i^T\beta + \beta_0),\non \end{equation}\]

a generalization of (4.41) in the textbook where we sum over all the observations. Consider minimizing \(D^\ast\) subject to \(\|\beta\|=1\). Describe this criterion in words. Does it solve the optimal separating hyperplane problem?

Soln. 4.7

When \(\|\beta\| = 1\), \(\beta^Tx_i + \beta_0\) is the signed distance of \(x_i\) to the hyperplane \(\beta^Tx + \beta_0 = 0\). This does not solve the optimal separating hyperplane problem. Optimal separating hyperplane is actually solving a max-min problem such that each point satisfies the distance requirement, however minimizing \(D^\ast\) does not have such pointwise constraint.