Ex. 3.19

Ex. 3.19

Show that β^ridge increases as its tuning parameter λ0. Does the same property hold for the lasso and partial least squares estimates? For the latter, consider the tuning parameter to be the successive steps in the algorithm.

Soln. 3.19

Recall the SVD decomposition of X=UDVT. Here U and V are N×p and p×p orthogonal matrices, and D is a p×p diagonal matrix. So we have

βridge=(XTX+λI)1XTy=(VD2VT+λI)1VDUTy=(V(D2+λI)VT)1VDUTy=VT(D2+λI)1DUTy.

Therefore,

βridge22=yTUD(D2+λI)1(D2+λI)1DUTy=(UTy)T[D(D2+λI)2D](UTy)=j=1pdj2(UTy)j2(dj2+λ)2.

where D(D2+λI)2D represents a diagonal matrix with elements dj2(dj2+λ)2. Therefore we see that β^ridge increases as its tuning parameter λ0.

For Lasso, there is no explicit solution as Ridge, however, we can start with the orthogonal case, where the formula is given in Ex. 3.16. It's easy to see that β^lasso1 increases as λ0. For the general case, recall the dual form of Lasso defined in (3.51) and (3.52) in the text. It's easy to see that t in (3.51) and λ in (3.52) have an inverse relationship, therefore, as λ0, t increases and so does the norm of optimal solutions (see Figure 3.11 for an intuitive illustration in R2).