Ex. 15.5
Ex. 15.5
Show that the sampling correlation between a pair of random forest trees at a point \(x\) is given by
\[\begin{equation}
\rho(x) = \frac{\text{Var}_{\bb{Z}}[E_{\Theta|\bb{Z}}T(x;
\Theta(\bb{Z}))]}{\text{Var}_{\bb{Z}}[E_{\Theta|\bb{Z}}T(x;
\Theta(\bb{Z}))] + E_\bb{Z}\text{Var}_{\Theta|\bb{Z}}[T(x;\Theta(\bb{Z}))]}.\non
\end{equation}\]
The term in the numerator is \(\text{Var}_\bb{Z}[\hat f_{\text{rf}}(x)]\), and the second term in the denominator is the expected conditional variance due to the randomization in random forests.
Soln. 15.5
Recall (15.6) in the text, we have
\[\begin{eqnarray}
\rho(x) &=& \text{corr}[T(x;\Theta_1(\bb{Z})), T(x;\Theta_2(\bb{Z}))]\non\\
&=&\frac{\text{cov}(T(x;\Theta_1(\bb{Z})), T(x;\Theta_2(\bb{Z})))}{\sqrt{\text{var}(T(x;\Theta_1(\bb{Z})))\text{var}(T(x;\Theta_2(\bb{Z})))}}.\non
\end{eqnarray}\]
Note that
\[\begin{eqnarray}
&&\text{cov}(T(x;\Theta_1(\bb{Z})), T(x;\Theta_2(\bb{Z})))\non\\
&=&E_{\bb{Z}}[\text{cov}_{\Theta|\bb{Z}}(T(x;\Theta_1(\bb{Z})), T(x;\Theta_2(\bb{Z})))]\non\\
&&+\text{cov}_{\bb{Z}}(E_{\Theta_1|\bb{Z}}[T(x;\Theta_1(\bb{Z})],E_{\Theta_2|\bb{Z}}[T(x;\Theta_2(\bb{Z})])\non\\
&=&0+\text{cov}_{\bb{Z}}(E_{\Theta_1|\bb{Z}}[T(x;\Theta_1(\bb{Z})],E_{\Theta_2|\bb{Z}}[T(x;\Theta_2(\bb{Z})])\non\\
&=&\text{Var}_{\bb{Z}}[E_{\Theta|\bb{Z}}T(x;
\Theta(\bb{Z}))],\non
\end{eqnarray}\]
where the last equation follows from \(T(x;\Theta_1(\bb{Z}))\) and \(T(x;\Theta_2(\bb{Z}))\) are independent and have the same distribution. This is the numerator in the formula. The denominator follows directly from (15.9) in the text.