A Solution Manual for ESL

Ex. 7.6

YuhangZhou88/ESL_Solution

A Solution Manual for ESL

YuhangZhou88/ESL_Solution

Home
ESL Solution
ESL Solution
- 2 Overview of Supervised Learning
  2 Overview of Supervised Learning
  - Ex. 2.1
  - Ex. 2.2
  - Ex. 2.3
  - Ex. 2.4
  - Ex. 2.5
  - Ex. 2.6
  - Ex. 2.7
  - Ex. 2.8
  - Ex. 2.9
- 3 Linear Methods for Regression
  3 Linear Methods for Regression
  - Ex. 3.1
  - Ex. 3.2
  - Ex. 3.3
  - Ex. 3.4
  - Ex. 3.5
  - Ex. 3.6
  - Ex. 3.7
  - Ex. 3.8
  - Ex. 3.9
  - Ex. 3.10
  - Ex. 3.11
  - Ex. 3.12
  - Ex. 3.13
  - Ex. 3.14
  - Ex. 3.15
  - Ex. 3.16
  - Ex. 3.17
  - Ex. 3.18
  - Ex. 3.19
  - Ex. 3.20
  - Ex. 3.21
  - Ex. 3.22
  - Ex. 3.23
  - Ex. 3.24
  - Ex. 3.25
  - Ex. 3.26
  - Ex. 3.27
  - Ex. 3.28
  - Ex. 3.29
  - Ex. 3.30
- 4 Linear Methods for Classification
  4 Linear Methods for Classification
  - Ex. 4.1
  - Ex. 4.2
  - Ex. 4.3
  - Ex. 4.4
  - Ex. 4.5
  - Ex. 4.6
  - Ex. 4.7
  - Ex. 4.8
  - Ex. 4.9
- 5 Basis Expansions and Regularization
  5 Basis Expansions and Regularization
  - Ex. 5.1
  - Ex. 5.2
  - Ex. 5.3
  - Ex. 5.4
  - Ex. 5.5
  - Ex. 5.6
  - Ex. 5.7
  - Ex. 5.8
  - Ex. 5.9
  - Ex. 5.10
  - Ex. 5.11
  - Ex. 5.12
  - Ex. 5.13
  - Ex. 5.14
  - Ex. 5.15
  - Ex. 5.16
  - Ex. 5.17
  - Ex. 5.18
  - Ex. 5.19
- 6 Kernel Smoothing Methods
  6 Kernel Smoothing Methods
  - Ex. 6.1
  - Ex. 6.2
  - Ex. 6.3
  - Ex. 6.4
  - Ex. 6.5
  - Ex. 6.6
  - Ex. 6.7
  - Ex. 6.8
  - Ex. 6.9
  - Ex. 6.10
  - Ex. 6.11
  - Ex. 6.12
- 7 Model Assessment and Selection
  7 Model Assessment and Selection
  - Ex. 7.1
  - Ex. 7.2
  - Ex. 7.3
  - Ex. 7.4
  - Ex. 7.5
  - Ex. 7.6
  - Ex. 7.7
  - Ex. 7.8
  - Ex. 7.9
  - Ex. 7.10 (TODO)
- 8 Model Inference and Averaging
  8 Model Inference and Averaging
  - Ex. 8.1
  - Ex. 8.2
  - Ex. 8.3
  - Ex. 8.4
  - Ex. 8.5
  - Ex. 8.6
  - Ex. 8.7
- 9 Additive Models and Trees
  9 Additive Models and Trees
  - Ex. 9.1
  - Ex. 9.2
  - Ex. 9.3
  - Ex. 9.4
  - Ex. 9.5
  - Ex. 9.6
- 10 Boosting and Additive Trees
  10 Boosting and Additive Trees
  - Ex. 10.1
  - Ex. 10.2
  - Ex. 10.3
  - Ex. 10.4
  - Ex. 10.5
  - Ex. 10.6
  - Ex. 10.7
  - Ex. 10.8
  - Ex. 10.9
  - Ex. 10.10
  - Ex. 10.11
  - Ex. 10.12
- 11 Neural Networks
  11 Neural Networks
- 12 Flexible Discriminants
  12 Flexible Discriminants
  - Ex. 12.1
  - Ex. 12.2
  - Ex. 12.3
  - Ex. 12.4
  - Ex. 12.5 (TODO)
  - Ex. 12.6
  - Ex. 12.7
  - Ex. 12.8
  - Ex. 12.9
  - Ex. 12.10
  - Ex. 12.11
- 13 Prototypes and Nearest Neighbors
  13 Prototypes and Nearest Neighbors
  - Ex. 13.1
  - Ex. 13.2
  - Ex. 13.3
  - Ex. 13.4
  - Ex. 13.5
  - Ex. 13.6
  - Ex. 13.7
  - Ex. 13.8 (TODO)
- 14 Unsupervised Learning
  14 Unsupervised Learning
  - Ex. 14.1
  - Ex. 14.2
  - Ex. 14.7
  - Ex. 14.8
  - Ex. 14.10
  - Ex. 14.11
  - Ex. 14.18
  - Ex. 14.19
  - Ex. 14.20
  - Ex. 14.23
  - Ex. 14.24
- 15 Random Forests
  15 Random Forests
  - Ex. 15.1
  - Ex. 15.2 (TODO)
  - Ex. 15.3
  - Ex. 15.4
  - Ex. 15.5
  - Ex. 15.6
  - Ex. 15.7
- 16 Ensemble Learning
  16 Ensemble Learning
- 17 Undirected Graphical Models
  17 Undirected Graphical Models
  - Ex. 17.1
  - Ex. 17.2
  - Ex. 17.3
  - Ex. 17.4
  - Ex. 17.5
  - Ex. 17.6
  - Ex. 17.7
  - Ex. 17.8
  - Ex. 17.9
  - Ex. 17.10
  - Ex. 17.11
  - Ex. 17.12
- 18 High Dimensional Problems
  18 High Dimensional Problems
  - Ex. 18.1
  - Ex. 18.2
  - Ex. 18.3
  - Ex. 18.4
  - Ex. 18.5
  - Ex. 18.6
  - Ex. 18.7
  - Ex. 18.8
  - Ex. 18.9
  - Ex. 18.10
  - Ex. 18.11
  - Ex. 18.12
  - Ex. 18.13
  - Ex. 18.14
  - Ex. 18.15
  - Ex. 18.16
  - Ex. 18.17
  - Ex. 18.18
  - Ex. 18.19
  - Ex. 18.20

Ex. 7.6

Ex. 7.6

Show that for an additive-error model, the effective degrees-of-freedom for the \(k\)-nearest-neighbors regression fit is \(N/k\).

Soln. 7.6

Note that for this \(k\)-nearest-neighbors model, it's a linear smoother. To see that, note

\[\begin{equation} \hat Y(x) = \frac{1}{k}\sum_{i: x_i\in N_k(x)}y_i = \frac{1}{k}\sum_{i=1}^N\eta_i y_i\non \end{equation}\]

where \(\eta_i = 1\) if \(x_i\in N_k(x)\) and 0 otherwise.

So we can write

\[\begin{equation} \hat Y = \frac{1}{k}\bb{S}\bb{y}\non \end{equation}\]

in which \(\bb{S}\) is a binary matrix with diagonal elements being 1 since the nearest one (itself) must be included in estimation. Therefore, the effective df is simply

\[\begin{equation} \frac{1}{k}\text{trace}(\bb{S}) = \frac{N}{k}.\non \end{equation}\]