Ex. 16.1

Describe exactly how to generate the block correlated data used in the simulation in Section 16.2.3.

Soln. 16.1

First we need to simulate \(\bb{X}\) from a multi-dimensional Gaussian distribution with mean zero and covariance matrix \(C\in \mathbb{R}^{1000\times 1000}\). Specifically, \(C\) is a blockwise matrix

\[\begin{equation} C = \begin{pmatrix} C_1 & \bb{0} & \cdots & \bb{0}\\ \bb{0} & C_2 & \cdots & \bb{0}\\ \vdots & \vdots & \ddots & \vdots\\ \bb{0} & \bb{0} & \cdots & C_{20} \end{pmatrix},\non \end{equation}\]

where each \(C_i\) being the same

\[\begin{equation} C_i = \begin{pmatrix} 1 & \rho & \cdots & \rho\\ \rho & 1 & \cdots & \rho\\ \vdots & \vdots & \ddots & \vdots\\ \rho & \rho & \cdots & 1 \end{pmatrix}.\non \end{equation}\]

Note that \(\rho=0.95\) in this example. Recall that in Ex. 15.1 we proved that \(-\frac{1}{50-1}\le \rho\le 1\).

Given simulated \(\bb{X}\in \mathbb{R}^{1000\times 1}\), write \(\bb{X}^T=(x_1, x_2,...,x_{1000})\). We randomly pick \(x_1^\ast\) from \(x_1,...,x_{20}\), and then randomly pick \(x_2^\ast\) from \(x_{21}, ..., x_{40}\), and so on and so forth. We end up with \((x_1^\ast, x_2^\ast,...,x_{50}^\ast)\) being our variables.

Next we simulate \(\beta\in\mathbb{R}^{50\times 1}\) from a standard multi-dimensional Gaussian distribution, and \(\epsilon\in\mathbb{R}^{50\times 1}\) from a multi-dimensional Gaussian distribution with mean zero and covariance matrix \(\sigma^2\bb{I}_{50}\).

The data model can be written as

\[\begin{equation} Y = f(X^\ast) + \epsilon,\non \end{equation}\]

with \(f(X)=\beta^TX^\ast\).

The value of \(\sigma^2\) is determined by the noise-to-signal ratio \(\frac{\sigma^2}{\text{Var}(f(X^\ast))}=0.72\) (see, e.g., (11.18) in the text), so that \(\sigma^2=50 * 0.72= 36\).