A Solution Manual for ESL

Ex. 10.11

YuhangZhou88/ESL_Solution

A Solution Manual for ESL

YuhangZhou88/ESL_Solution

Home
ESL Solution
ESL Solution
- 2 Overview of Supervised Learning
  2 Overview of Supervised Learning
  - Ex. 2.1
  - Ex. 2.2
  - Ex. 2.3
  - Ex. 2.4
  - Ex. 2.5
  - Ex. 2.6
  - Ex. 2.7
  - Ex. 2.8
  - Ex. 2.9
- 3 Linear Methods for Regression
  3 Linear Methods for Regression
  - Ex. 3.1
  - Ex. 3.2
  - Ex. 3.3
  - Ex. 3.4
  - Ex. 3.5
  - Ex. 3.6
  - Ex. 3.7
  - Ex. 3.8
  - Ex. 3.9
  - Ex. 3.10
  - Ex. 3.11
  - Ex. 3.12
  - Ex. 3.13
  - Ex. 3.14
  - Ex. 3.15
  - Ex. 3.16
  - Ex. 3.17
  - Ex. 3.18
  - Ex. 3.19
  - Ex. 3.20
  - Ex. 3.21
  - Ex. 3.22
  - Ex. 3.23
  - Ex. 3.24
  - Ex. 3.25
  - Ex. 3.26
  - Ex. 3.27
  - Ex. 3.28
  - Ex. 3.29
  - Ex. 3.30
- 4 Linear Methods for Classification
  4 Linear Methods for Classification
  - Ex. 4.1
  - Ex. 4.2
  - Ex. 4.3
  - Ex. 4.4
  - Ex. 4.5
  - Ex. 4.6
  - Ex. 4.7
  - Ex. 4.8
  - Ex. 4.9
- 5 Basis Expansions and Regularization
  5 Basis Expansions and Regularization
  - Ex. 5.1
  - Ex. 5.2
  - Ex. 5.3
  - Ex. 5.4
  - Ex. 5.5
  - Ex. 5.6
  - Ex. 5.7
  - Ex. 5.8
  - Ex. 5.9
  - Ex. 5.10
  - Ex. 5.11
  - Ex. 5.12
  - Ex. 5.13
  - Ex. 5.14
  - Ex. 5.15
  - Ex. 5.16
  - Ex. 5.17
  - Ex. 5.18
  - Ex. 5.19
- 6 Kernel Smoothing Methods
  6 Kernel Smoothing Methods
  - Ex. 6.1
  - Ex. 6.2
  - Ex. 6.3
  - Ex. 6.4
  - Ex. 6.5
  - Ex. 6.6
  - Ex. 6.7
  - Ex. 6.8
  - Ex. 6.9
  - Ex. 6.10
  - Ex. 6.11
  - Ex. 6.12
- 7 Model Assessment and Selection
  7 Model Assessment and Selection
  - Ex. 7.1
  - Ex. 7.2
  - Ex. 7.3
  - Ex. 7.4
  - Ex. 7.5
  - Ex. 7.6
  - Ex. 7.7
  - Ex. 7.8
  - Ex. 7.9
  - Ex. 7.10 (TODO)
- 8 Model Inference and Averaging
  8 Model Inference and Averaging
  - Ex. 8.1
  - Ex. 8.2
  - Ex. 8.3
  - Ex. 8.4
  - Ex. 8.5
  - Ex. 8.6
  - Ex. 8.7
- 9 Additive Models and Trees
  9 Additive Models and Trees
  - Ex. 9.1
  - Ex. 9.2
  - Ex. 9.3
  - Ex. 9.4
  - Ex. 9.5
  - Ex. 9.6
- 10 Boosting and Additive Trees
  10 Boosting and Additive Trees
  - Ex. 10.1
  - Ex. 10.2
  - Ex. 10.3
  - Ex. 10.4
  - Ex. 10.5
  - Ex. 10.6
  - Ex. 10.7
  - Ex. 10.8
  - Ex. 10.9
  - Ex. 10.10
  - Ex. 10.11
  - Ex. 10.12
- 11 Neural Networks
  11 Neural Networks
- 12 Flexible Discriminants
  12 Flexible Discriminants
  - Ex. 12.1
  - Ex. 12.2
  - Ex. 12.3
  - Ex. 12.4
  - Ex. 12.5 (TODO)
  - Ex. 12.6
  - Ex. 12.7
  - Ex. 12.8
  - Ex. 12.9
  - Ex. 12.10
  - Ex. 12.11
- 13 Prototypes and Nearest Neighbors
  13 Prototypes and Nearest Neighbors
  - Ex. 13.1
  - Ex. 13.2
  - Ex. 13.3
  - Ex. 13.4
  - Ex. 13.5
  - Ex. 13.6
  - Ex. 13.7
  - Ex. 13.8 (TODO)
- 14 Unsupervised Learning
  14 Unsupervised Learning
  - Ex. 14.1
  - Ex. 14.2
  - Ex. 14.7
  - Ex. 14.8
  - Ex. 14.10
  - Ex. 14.11
  - Ex. 14.18
  - Ex. 14.19
  - Ex. 14.20
  - Ex. 14.23
  - Ex. 14.24
- 15 Random Forests
  15 Random Forests
  - Ex. 15.1
  - Ex. 15.2 (TODO)
  - Ex. 15.3
  - Ex. 15.4
  - Ex. 15.5
  - Ex. 15.6
  - Ex. 15.7
- 16 Ensemble Learning
  16 Ensemble Learning
- 17 Undirected Graphical Models
  17 Undirected Graphical Models
  - Ex. 17.1
  - Ex. 17.2
  - Ex. 17.3
  - Ex. 17.4
  - Ex. 17.5
  - Ex. 17.6
  - Ex. 17.7
  - Ex. 17.8
  - Ex. 17.9
  - Ex. 17.10
  - Ex. 17.11
  - Ex. 17.12
- 18 High Dimensional Problems
  18 High Dimensional Problems
  - Ex. 18.1
  - Ex. 18.2
  - Ex. 18.3
  - Ex. 18.4
  - Ex. 18.5
  - Ex. 18.6
  - Ex. 18.7
  - Ex. 18.8
  - Ex. 18.9
  - Ex. 18.10
  - Ex. 18.11
  - Ex. 18.12
  - Ex. 18.13
  - Ex. 18.14
  - Ex. 18.15
  - Ex. 18.16
  - Ex. 18.17
  - Ex. 18.18
  - Ex. 18.19
  - Ex. 18.20

Ex. 10.11

Ex. 10.11

Show how to compute the partial dependence function \(f_{\mathcal{S}}(X_{\mathcal{S}})\) in (10.47) efficiently.

Soln. 10.11

In general, to calculate partial dependence as in (10.47), given a \(x_s\in X_{\mathcal{S}}\), we need to make \(N\) predictions for \(N\) samples \((x_s, x_{i\mathcal{C}}) (i=1,...,N)\) and take an average.

For decision trees, note that each node of the fitted tree remembers how many training samples went through it during the training, and thus we can use associated ratios to derive the final average. That means, we only need to traverse the tree for once. Please see Efficient Partial Dependence Plots with decision trees for a detailed description and scikit-learn's implementation.