Ex. 15.4

Ex. 15.4

Suppose \(x_i, i =1,...,N\) are iid \((\mu, \sigma^2)\). Let \(\bar x_1^\ast\) and \(\bar x_2^\ast\) be two bootstrap realizations of the sample mean. Show that the sampling correlation \(\text{corr}(\bar x_1^\ast, \bar x_2^\ast)=\frac{n}{2n-1}\approx 50\%\). Along the way, derive \(\text{var}(\bar x_1^\ast)\) and the variance of the bagged mean \(\bar x_{\text{bag}}\). Here \(\bar x\) is a linear statistic; bagging produces no reduction in variance for linear statistics.

Soln. 15.4

Denote

\[\begin{eqnarray} \bar x_1^\ast = \frac{1}{n}\sum_{i=1}^n\hat x_i,\ \ \bar x_2^\ast = \frac{1}{n}\sum_{i=1}^n\tilde x_i,\non \end{eqnarray}\]

where \(\{\hat x_i, i=1,...,n\}\) and \(\{\tilde x_i, i=1,...,n\}\) are realizations from the first and the second bootstrap respectively.

Note that both \(\hat x_i\) and \(\tilde x_i\) are sampled from the empirical distribution of \(\{x_i, i=1,...,n\}\).

Therefore, for any \(1\le i\le n\), we have

\[\begin{eqnarray} E[\hat x_i] &=& E[\tilde x_i] = \mu,\non\\ \text{var}(\hat x_i) &=& \text{var}(\tilde x_i) = \sigma^2.\non \end{eqnarray}\]

Also, for any \(1\le i, j\le n\), we have

\[\begin{eqnarray} \text{cov}(\hat x_i, \tilde x_j) = \frac{\sigma^2}{n}.\non \end{eqnarray}\]

Then, we know

\[\begin{eqnarray} \text{cov}(\bar x_1^\ast, \bar x_2^\ast) = \frac{1}{n^2}\left(\sum_{i,j}^n\text{cov}(\hat x_i, \tilde x_j)\right)=\frac{\sigma^2}{n},\non \end{eqnarray}\]

and

\[\begin{eqnarray} \text{var}(\bar x_1^\ast) &=&\text{var}\left(\frac{1}{n}\sum_{i=1}^n\tilde x_i\right)\non\\ &=&\frac{1}{n^2}\left(\sum_{i=1}^n\text{var}(x_i) + \sum_{j\neq k}^n\text{cov}(\hat x_j, \hat x_k)\right)\non\\ &=&\frac{1}{n^2}\left(n\cdot \sigma^2 + (n^2-n)\cdot \frac{\sigma^2}{n}\right)\non\\ &=&\frac{(2n-1)\sigma^2}{n^2}.\non \end{eqnarray}\]

Therefore we know that

\[\begin{equation} \text{corr}(\bar x_1^\ast, \bar x_2^\ast) = \frac{\text{cov}(\bar x_1^\ast, \bar x_2^\ast)}{\sqrt{\text{var}(\bar x_1^\ast)\text{var}(\bar x_2^\ast)}} = \frac{n}{2n-1}.\non \end{equation}\]

We have already derived \(\text{var}(\bar x_1^\ast)\) above. For \(\bar x_{\text{bag}}\), assume we have \(B\) realizations, then

\[\begin{eqnarray} \text{var}(\bar x_{\text{bag}}) &=& \text{var}\left(\frac{1}{B}\sum_{i=1}^B\bar x_i^\ast\right)\non\\ &=&\frac{1}{B^2}\sum_{i=1}^B\text{var}(\bar x_i^\ast) + \frac{1}{B^2}\sum_{j\neq k}^B\text{cov}(\bar x_j^\ast, \bar x_k^\ast)\non\\ &=&\frac{1}{B}\cdot\frac{(2n-1)\sigma^2}{n^2}+\frac{B-1}{B}\cdot \frac{\sigma^2}{n}\non\\ &=&\frac{(2n-1)+(B-1)n}{Bn^2}\cdot \sigma^2.\non \end{eqnarray}\]