A Solution Manual for ESL

Ex. 11.4

YuhangZhou88/ESL_Solution

A Solution Manual for ESL

YuhangZhou88/ESL_Solution

Home
ESL Solution
ESL Solution
- 2 Overview of Supervised Learning
  2 Overview of Supervised Learning
  - Ex. 2.1
  - Ex. 2.2
  - Ex. 2.3
  - Ex. 2.4
  - Ex. 2.5
  - Ex. 2.6
  - Ex. 2.7
  - Ex. 2.8
  - Ex. 2.9
- 3 Linear Methods for Regression
  3 Linear Methods for Regression
  - Ex. 3.1
  - Ex. 3.2
  - Ex. 3.3
  - Ex. 3.4
  - Ex. 3.5
  - Ex. 3.6
  - Ex. 3.7
  - Ex. 3.8
  - Ex. 3.9
  - Ex. 3.10
  - Ex. 3.11
  - Ex. 3.12
  - Ex. 3.13
  - Ex. 3.14
  - Ex. 3.15
  - Ex. 3.16
  - Ex. 3.17
  - Ex. 3.18
  - Ex. 3.19
  - Ex. 3.20
  - Ex. 3.21
  - Ex. 3.22
  - Ex. 3.23
  - Ex. 3.24
  - Ex. 3.25
  - Ex. 3.26
  - Ex. 3.27
  - Ex. 3.28
  - Ex. 3.29
  - Ex. 3.30
- 4 Linear Methods for Classification
  4 Linear Methods for Classification
  - Ex. 4.1
  - Ex. 4.2
  - Ex. 4.3
  - Ex. 4.4
  - Ex. 4.5
  - Ex. 4.6
  - Ex. 4.7
  - Ex. 4.8
  - Ex. 4.9
- 5 Basis Expansions and Regularization
  5 Basis Expansions and Regularization
  - Ex. 5.1
  - Ex. 5.2
  - Ex. 5.3
  - Ex. 5.4
  - Ex. 5.5
  - Ex. 5.6
  - Ex. 5.7
  - Ex. 5.8
  - Ex. 5.9
  - Ex. 5.10
  - Ex. 5.11
  - Ex. 5.12
  - Ex. 5.13
  - Ex. 5.14
  - Ex. 5.15
  - Ex. 5.16
  - Ex. 5.17
  - Ex. 5.18
  - Ex. 5.19
- 6 Kernel Smoothing Methods
  6 Kernel Smoothing Methods
  - Ex. 6.1
  - Ex. 6.2
  - Ex. 6.3
  - Ex. 6.4
  - Ex. 6.5
  - Ex. 6.6
  - Ex. 6.7
  - Ex. 6.8
  - Ex. 6.9
  - Ex. 6.10
  - Ex. 6.11
  - Ex. 6.12
- 7 Model Assessment and Selection
  7 Model Assessment and Selection
  - Ex. 7.1
  - Ex. 7.2
  - Ex. 7.3
  - Ex. 7.4
  - Ex. 7.5
  - Ex. 7.6
  - Ex. 7.7
  - Ex. 7.8
  - Ex. 7.9
  - Ex. 7.10 (TODO)
- 8 Model Inference and Averaging
  8 Model Inference and Averaging
  - Ex. 8.1
  - Ex. 8.2
  - Ex. 8.3
  - Ex. 8.4
  - Ex. 8.5
  - Ex. 8.6
  - Ex. 8.7
- 9 Additive Models and Trees
  9 Additive Models and Trees
  - Ex. 9.1
  - Ex. 9.2
  - Ex. 9.3
  - Ex. 9.4
  - Ex. 9.5
  - Ex. 9.6
- 10 Boosting and Additive Trees
  10 Boosting and Additive Trees
  - Ex. 10.1
  - Ex. 10.2
  - Ex. 10.3
  - Ex. 10.4
  - Ex. 10.5
  - Ex. 10.6
  - Ex. 10.7
  - Ex. 10.8
  - Ex. 10.9
  - Ex. 10.10
  - Ex. 10.11
  - Ex. 10.12
- 11 Neural Networks
  11 Neural Networks
- 12 Flexible Discriminants
  12 Flexible Discriminants
  - Ex. 12.1
  - Ex. 12.2
  - Ex. 12.3
  - Ex. 12.4
  - Ex. 12.5 (TODO)
  - Ex. 12.6
  - Ex. 12.7
  - Ex. 12.8
  - Ex. 12.9
  - Ex. 12.10
  - Ex. 12.11
- 13 Prototypes and Nearest Neighbors
  13 Prototypes and Nearest Neighbors
  - Ex. 13.1
  - Ex. 13.2
  - Ex. 13.3
  - Ex. 13.4
  - Ex. 13.5
  - Ex. 13.6
  - Ex. 13.7
  - Ex. 13.8 (TODO)
- 14 Unsupervised Learning
  14 Unsupervised Learning
  - Ex. 14.1
  - Ex. 14.2
  - Ex. 14.7
  - Ex. 14.8
  - Ex. 14.10
  - Ex. 14.11
  - Ex. 14.18
  - Ex. 14.19
  - Ex. 14.20
  - Ex. 14.23
  - Ex. 14.24
- 15 Random Forests
  15 Random Forests
  - Ex. 15.1
  - Ex. 15.2 (TODO)
  - Ex. 15.3
  - Ex. 15.4
  - Ex. 15.5
  - Ex. 15.6
  - Ex. 15.7
- 16 Ensemble Learning
  16 Ensemble Learning
- 17 Undirected Graphical Models
  17 Undirected Graphical Models
  - Ex. 17.1
  - Ex. 17.2
  - Ex. 17.3
  - Ex. 17.4
  - Ex. 17.5
  - Ex. 17.6
  - Ex. 17.7
  - Ex. 17.8
  - Ex. 17.9
  - Ex. 17.10
  - Ex. 17.11
  - Ex. 17.12
- 18 High Dimensional Problems
  18 High Dimensional Problems
  - Ex. 18.1
  - Ex. 18.2
  - Ex. 18.3
  - Ex. 18.4
  - Ex. 18.5
  - Ex. 18.6
  - Ex. 18.7
  - Ex. 18.8
  - Ex. 18.9
  - Ex. 18.10
  - Ex. 18.11
  - Ex. 18.12
  - Ex. 18.13
  - Ex. 18.14
  - Ex. 18.15
  - Ex. 18.16
  - Ex. 18.17
  - Ex. 18.18
  - Ex. 18.19
  - Ex. 18.20

Ex. 11.4

Ex. 11.4

Consider a neural network for a \(K\) class outcome that uses cross-entropy loss. If the network has no hidden layer, show that the model is equivalent to the multinomial logistic model described in Chapter 4.

Soln. 11.4

From (11.5) in the text, if there are no hidden layers, then

\[\begin{equation} T_k = \beta_{0k} + \beta_k^TX, \ k=1,...,K\non \end{equation}\]

thus by (11.6) in the text

\[\begin{eqnarray} f_k(X) &=& g_k(X)\non\\ &=& \frac{\exp(\beta_{0k} + \beta_k^TX)}{\sum_{l=1}^K\exp(\beta_{0l} + \beta_l^TX)}.\non \end{eqnarray}\]

If we normalize these probabilities by

\[\begin{equation} f_k(x) \leftarrow f_k(x)/f_K(x) \cdot \frac{1}{1+\sum_{l=1}^{K-1}\exp(\beta_{0l} + \beta_l^TX)},\ k=1,...,K \non \end{equation}\]

we get exactly the multinomial logistic model studied in Ex. 4.4.