Ex. 15.5

Ex. 15.5

Show that the sampling correlation between a pair of random forest trees at a point \(x\) is given by

\[\begin{equation} \rho(x) = \frac{\text{Var}_{\bb{Z}}[E_{\Theta|\bb{Z}}T(x; \Theta(\bb{Z}))]}{\text{Var}_{\bb{Z}}[E_{\Theta|\bb{Z}}T(x; \Theta(\bb{Z}))] + E_\bb{Z}\text{Var}_{\Theta|\bb{Z}}[T(x;\Theta(\bb{Z}))]}.\non \end{equation}\]

The term in the numerator is \(\text{Var}_\bb{Z}[\hat f_{\text{rf}}(x)]\), and the second term in the denominator is the expected conditional variance due to the randomization in random forests.

Soln. 15.5

Recall (15.6) in the text, we have

\[\begin{eqnarray} \rho(x) &=& \text{corr}[T(x;\Theta_1(\bb{Z})), T(x;\Theta_2(\bb{Z}))]\non\\ &=&\frac{\text{cov}(T(x;\Theta_1(\bb{Z})), T(x;\Theta_2(\bb{Z})))}{\sqrt{\text{var}(T(x;\Theta_1(\bb{Z})))\text{var}(T(x;\Theta_2(\bb{Z})))}}.\non \end{eqnarray}\]

Note that

\[\begin{eqnarray} &&\text{cov}(T(x;\Theta_1(\bb{Z})), T(x;\Theta_2(\bb{Z})))\non\\ &=&E_{\bb{Z}}[\text{cov}_{\Theta|\bb{Z}}(T(x;\Theta_1(\bb{Z})), T(x;\Theta_2(\bb{Z})))]\non\\ &&+\text{cov}_{\bb{Z}}(E_{\Theta_1|\bb{Z}}[T(x;\Theta_1(\bb{Z})],E_{\Theta_2|\bb{Z}}[T(x;\Theta_2(\bb{Z})])\non\\ &=&0+\text{cov}_{\bb{Z}}(E_{\Theta_1|\bb{Z}}[T(x;\Theta_1(\bb{Z})],E_{\Theta_2|\bb{Z}}[T(x;\Theta_2(\bb{Z})])\non\\ &=&\text{Var}_{\bb{Z}}[E_{\Theta|\bb{Z}}T(x; \Theta(\bb{Z}))],\non \end{eqnarray}\]

where the last equation follows from \(T(x;\Theta_1(\bb{Z}))\) and \(T(x;\Theta_2(\bb{Z}))\) are independent and have the same distribution. This is the numerator in the formula. The denominator follows directly from (15.9) in the text.