Ex. 3.28

Ex. 3.28

Suppose for a given \(t\) in (3.51), the fitted lasso coefficient for variable \(X_j\) is \(\hat\beta_j=a\). Suppose we augment our set of variables with an identical copy \(X^\ast_j=X_j\). Characterize the effect of this exact collinearity by describing the set of solutions for \(\hat\beta_j\) and \(\hat\beta_j^\ast\), using the same value of \(t\).

Soln. 3.28

The original lasso problem is

\[\begin{eqnarray} \hat\beta^{\text{lasso}} = &&\underset{\beta}{\operatorname{argmin}}\sum_{i=1}^N\left(y_i-\beta_0-\sum_{k=1}^px_{ik}\beta_k\right)^2\non\\ &&\text{s.t.} \ \sum_{k=1}^p|\beta_k|\le t.\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (P_0)\non \end{eqnarray}\]

We know that \(\hat\beta^{\text{lasso}}_j =a\). When an identical copy \(X^\ast_j=X_j\) is included, the problem becomes

\[\begin{eqnarray} \hat\beta^{\text{new}} = &&\underset{\beta}{\operatorname{argmin}}\sum_{i=1}^N\left(y_i-\beta_0-\sum_{k=1}^px_{ik}\beta_k - x_{ij}\beta_j^\ast\right)^2\non\\ &&\text{s.t.} \ \sum_{k=1}^p|\beta_k| + |\beta_j^\ast|\le t.\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (P_1)\non \end{eqnarray}\]

Denote \(\tilde\beta_j = \beta_j + \beta_j^\ast\), the problem \(P_1\) can be rewritten as

\[\begin{eqnarray} \hat\beta^{\text{new}} = &&\underset{\beta}{\operatorname{argmin}}\sum_{i=1}^N\left(y_i-\beta_0-\sum_{k\neq j}^px_{ik}\beta_k - x_{ij}\tilde\beta_j\right)^2\non\\ &&\text{s.t.} \ \sum_{k\neq j}^p|\beta_k| + |\tilde\beta_j| + (|\beta_j|+|\beta_j^\ast|-|\tilde\beta_j|)\le t.\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (P_2)\non \end{eqnarray}\]

Comparing \(P_2\) with original lasso problem \(P_0\) we see that the objective is the same while the constraint of \(P_2\) is more strict than that of \(P_0\) because \(|\beta_j|+|\beta_j^\ast|-|\tilde\beta_j|\ge 0\).

On the other hand, note that by symmetry, we have \(\hat\beta_j = \hat\beta_j^\ast\). Given an optimal solution \(\hat\beta^{\text{lasso}}\) to original problem \(P_0\), we can set \(\beta_j=\beta_j^\ast=\frac{1}{2}\hat\beta^{\text{lasso}}_j=\frac{a}{2}\). In that case, we obtain an optimal solution to problem \(P_2\) because \(|\beta_j|+|\beta_j^\ast|-|\tilde\beta_j| = |a/2|+|a/2|-|a|=0\).

Therefore we see that if we include an identical copy of \(X_j^\ast = X_j\), the coefficients for other variables \(X_k\) for \(k\neq j\) remain the same while the coefficients for \(X_j^\ast\) and \(X_j\) are the half of the original.