Ex. 5.16

Ex. 5.16

Consider the ridge regression problem (5.53), and assume MN. Assume you have a kernel K that computes the inner product K(x,y)=m=1Mhm(x)hm(y).

(a)

Derive (5.62) on page 171 in the text. How would you compute the matrices V and Dγ, given K? Hence show that (5.63) is equivalent to (5.53).

(b)

Show that

f^=Hβ^=K(K+λI)1y,

where H is the N×M matrix of evaluations hm(xi), and K=HHT the N×N matrix of inner-product h(xi)Th(xj).

(c)

Show that

f^(x)=h(x)Tβ^=i=1NK(x,xi)αi^

and α^=(K+λI)1y.

(d)

How would you modify your solution if M<N?

Soln. 5.16
(a)

By definition of the kernel K, we have

K(x,y)=m=1Mhm(x)hm(y)=i=1γiϕi(x)ϕi(y).

Multiply each summand above by ϕk(x) and calculate K(x,y),ϕk(x),

(1)m=1Mhm(x),ϕk(x)hm(y)=i=1ϕi(x),ϕk(x)ϕi(y).

Since {ϕi,i=1,...,} are orthogonal, we have

ϕi(x),ϕk(x)={1 if i=k,0 otherwise.

Thus, (1) becomes

m=1Mhm(x),ϕk(x)hm(y)=γkϕk(y).

Let gkm=hm(x),ϕk(x) and calculate K(x,y),ϕl(y), we get

m=1Mgkmhm(y)=γkϕk(y),m=1Mgkmhm(y),ϕl(y)=γkϕk(y),ϕl(y),m=1Mgkmglm=γkδk,l

where δk,l=1 if k=l and 0 otherwise.

Let GM={gnm}RM×N, we have

GMGMT=diag{γ1,γ2,...,γM}=Dγ.

Let VT=Dγ12GM, we have

VVTGMT=GMTDγ1GM=IN.

Let h(x)=(h1(x),h2(x),...,hM(x))T and ϕ(x)=(ϕ1(x),ϕ2(x),...,ϕM(x))T, then the three equations above can be rewritten as

GMh(x)=Dγϕ(x)VDγ12GMh(x)=VDγ12Dγϕ(x)h(x)=VDγ12.

To show that (5.63) is equivalent to (5.53) in the text, we start with (5.63). Let β=(β1,β2,...,βm)T and c=Dγ12VTβ,

min{βm}1Mi=1N(yim=1Mβmhm(xi))2+λm=1Mβm2=minβi=1N(yiβTh(xi))2+λβTβ=minβi=1N(yiβTVDγ12ϕ(xi))2+λβTβ=minci=1N(yicTϕ(xi))2+λ(VDγ12c)TVDγ12c=minci=1N(yicTϕ(xi))2+λcTcDγ1=min{cj}1i=1N(yij=1cjϕj(xi))2+λj=1cj2γj,

which is (5.53) in the text.

(b)

Recall that in (a) we have

minβi=1N(yiβTh(xi))2+λβTβ.

Taking derivative w.r.t β and setting it to be zero yields

HT(yHβ^)+λβ^=0.

Thus we have

β^=(HTH+λI)1HTy

and

f^=H(HTH+λI)1HTy.

By Woodbury matrix identity, we have

(HTH+λI)1=1λI1λIHT(I+1λHHT)1H1λI.

Therefore, we have

f^=1λHHTy1λHHT(λI+HHT)1HHTy=1λHHT[I(λI+HHT)1HHT]y=1λHHT[(λI+HHT)1(λI+HHT)(λI+HHT)1HHT]y=1λHHT[(λI+HHT)1λI]y=HHT(λI+HHT)1y=K(K+λI)1y.
(c)

This is directly derived from (b).

(d)

The solution remains the same as K+λI is invertible as long as λ0. When λ=0 however, we have

f^=Hβ^=H(HTH)1HTy=y.