A Solution Manual for ESL

Ex. 2.1

YuhangZhou88/ESL_Solution

A Solution Manual for ESL

YuhangZhou88/ESL_Solution

Home
ESL Solution
ESL Solution
- 2 Overview of Supervised Learning
  2 Overview of Supervised Learning
  - Ex. 2.1
  - Ex. 2.2
  - Ex. 2.3
  - Ex. 2.4
  - Ex. 2.5
  - Ex. 2.6
  - Ex. 2.7
  - Ex. 2.8
  - Ex. 2.9
- 3 Linear Methods for Regression
  3 Linear Methods for Regression
  - Ex. 3.1
  - Ex. 3.2
  - Ex. 3.3
  - Ex. 3.4
  - Ex. 3.5
  - Ex. 3.6
  - Ex. 3.7
  - Ex. 3.8
  - Ex. 3.9
  - Ex. 3.10
  - Ex. 3.11
  - Ex. 3.12
  - Ex. 3.13
  - Ex. 3.14
  - Ex. 3.15
  - Ex. 3.16
  - Ex. 3.17
  - Ex. 3.18
  - Ex. 3.19
  - Ex. 3.20
  - Ex. 3.21
  - Ex. 3.22
  - Ex. 3.23
  - Ex. 3.24
  - Ex. 3.25
  - Ex. 3.26
  - Ex. 3.27
  - Ex. 3.28
  - Ex. 3.29
  - Ex. 3.30
- 4 Linear Methods for Classification
  4 Linear Methods for Classification
  - Ex. 4.1
  - Ex. 4.2
  - Ex. 4.3
  - Ex. 4.4
  - Ex. 4.5
  - Ex. 4.6
  - Ex. 4.7
  - Ex. 4.8
  - Ex. 4.9
- 5 Basis Expansions and Regularization
  5 Basis Expansions and Regularization
  - Ex. 5.1
  - Ex. 5.2
  - Ex. 5.3
  - Ex. 5.4
  - Ex. 5.5
  - Ex. 5.6
  - Ex. 5.7
  - Ex. 5.8
  - Ex. 5.9
  - Ex. 5.10
  - Ex. 5.11
  - Ex. 5.12
  - Ex. 5.13
  - Ex. 5.14
  - Ex. 5.15
  - Ex. 5.16
  - Ex. 5.17
  - Ex. 5.18
  - Ex. 5.19
- 6 Kernel Smoothing Methods
  6 Kernel Smoothing Methods
  - Ex. 6.1
  - Ex. 6.2
  - Ex. 6.3
  - Ex. 6.4
  - Ex. 6.5
  - Ex. 6.6
  - Ex. 6.7
  - Ex. 6.8
  - Ex. 6.9
  - Ex. 6.10
  - Ex. 6.11
  - Ex. 6.12
- 7 Model Assessment and Selection
  7 Model Assessment and Selection
  - Ex. 7.1
  - Ex. 7.2
  - Ex. 7.3
  - Ex. 7.4
  - Ex. 7.5
  - Ex. 7.6
  - Ex. 7.7
  - Ex. 7.8
  - Ex. 7.9
  - Ex. 7.10 (TODO)
- 8 Model Inference and Averaging
  8 Model Inference and Averaging
  - Ex. 8.1
  - Ex. 8.2
  - Ex. 8.3
  - Ex. 8.4
  - Ex. 8.5
  - Ex. 8.6
  - Ex. 8.7
- 9 Additive Models and Trees
  9 Additive Models and Trees
  - Ex. 9.1
  - Ex. 9.2
  - Ex. 9.3
  - Ex. 9.4
  - Ex. 9.5
  - Ex. 9.6
- 10 Boosting and Additive Trees
  10 Boosting and Additive Trees
  - Ex. 10.1
  - Ex. 10.2
  - Ex. 10.3
  - Ex. 10.4
  - Ex. 10.5
  - Ex. 10.6
  - Ex. 10.7
  - Ex. 10.8
  - Ex. 10.9
  - Ex. 10.10
  - Ex. 10.11
  - Ex. 10.12
- 11 Neural Networks
  11 Neural Networks
- 12 Flexible Discriminants
  12 Flexible Discriminants
  - Ex. 12.1
  - Ex. 12.2
  - Ex. 12.3
  - Ex. 12.4
  - Ex. 12.5 (TODO)
  - Ex. 12.6
  - Ex. 12.7
  - Ex. 12.8
  - Ex. 12.9
  - Ex. 12.10
  - Ex. 12.11
- 13 Prototypes and Nearest Neighbors
  13 Prototypes and Nearest Neighbors
  - Ex. 13.1
  - Ex. 13.2
  - Ex. 13.3
  - Ex. 13.4
  - Ex. 13.5
  - Ex. 13.6
  - Ex. 13.7
  - Ex. 13.8 (TODO)
- 14 Unsupervised Learning
  14 Unsupervised Learning
  - Ex. 14.1
  - Ex. 14.2
  - Ex. 14.7
  - Ex. 14.8
  - Ex. 14.10
  - Ex. 14.11
  - Ex. 14.18
  - Ex. 14.19
  - Ex. 14.20
  - Ex. 14.23
  - Ex. 14.24
- 15 Random Forests
  15 Random Forests
  - Ex. 15.1
  - Ex. 15.2 (TODO)
  - Ex. 15.3
  - Ex. 15.4
  - Ex. 15.5
  - Ex. 15.6
  - Ex. 15.7
- 16 Ensemble Learning
  16 Ensemble Learning
- 17 Undirected Graphical Models
  17 Undirected Graphical Models
  - Ex. 17.1
  - Ex. 17.2
  - Ex. 17.3
  - Ex. 17.4
  - Ex. 17.5
  - Ex. 17.6
  - Ex. 17.7
  - Ex. 17.8
  - Ex. 17.9
  - Ex. 17.10
  - Ex. 17.11
  - Ex. 17.12
- 18 High Dimensional Problems
  18 High Dimensional Problems
  - Ex. 18.1
  - Ex. 18.2
  - Ex. 18.3
  - Ex. 18.4
  - Ex. 18.5
  - Ex. 18.6
  - Ex. 18.7
  - Ex. 18.8
  - Ex. 18.9
  - Ex. 18.10
  - Ex. 18.11
  - Ex. 18.12
  - Ex. 18.13
  - Ex. 18.14
  - Ex. 18.15
  - Ex. 18.16
  - Ex. 18.17
  - Ex. 18.18
  - Ex. 18.19
  - Ex. 18.20

Ex. 2.1

Ex. 2.1

Suppose each of \(K\)-classes has associated target \(t_k\), which is a vector of all zeros, except a one in the \(k\)-th position. Show that classifying to the largest of \(\hat y\) amounts to choosing the closet target, \(\min_k\|t_k-\hat y\|\), if the elements of \(\hat y\) sum to one.

Soln. 2.1

We need to prove:

\[\begin{equation} \underset{k}{\operatorname{argmax}} \hat y_k = \underset{k}{\operatorname{argmin}} \|t_k-\hat y\|^2 \label{eq:2-1a} \end{equation}\]

By definition of \(t_k\), we have

\[\begin{align} \|t_k-\hat y\|^2 &= (1-\hat y_k)^2 + \sum_{l \neq k }(0 - \hat y_l)^2\nonumber\\ &= (1-\hat y_k)^2 + \sum_{l \neq k }\hat y_l^2\nonumber\\ &= 1 - 2\hat y_k + \sum\hat y_l^2 \label{eq:2-1b} \end{align}\]

Given \(\eqref{eq:2-1b}\), it's straightforward to see that \(\eqref{eq:2-1a}\) indeed holds because only \(-2\hat y_k\) depends on \(k\).

Remark

The assumption \(\sum_{k=1}^K\hat y_k=1\) is actually not required.