Ex. 7.4

Ex. 7.4

Consider the in-sample prediction error (7.18) and the training error err in the case of squared-error loss:

Errin=1Ni=1NEY0(Yi0f^(xi))2err=1Ni=1N(yif^(xi))2.

Add and subtract f(xi) and Ef^(xi) in each expression and expand. Hence establish that the average optimism in the training error is

2Ni=1NCov(y^i,yi),

as given in (7.21).

Soln. 7.4

We start with Errin. Let's denote y^i=f^(xi) and write

Yi0f^(xi)=Yi0f(xi)+f(xi)Ey^i+Ey^iy^i

so that

Errin=1Ni=1NEY0(Yi0f(xi)+f(xi)Ey^i+Ey^iy^i)2=1Ni=1NAi+Bi+Ci+Di+Ei+Fi,

where

Ai=EY0(Yi0f(xi))2Bi=EY0(f(xi)Ey^i)2=(f(xi)Ey^i)2Ci=EY0(Ey^iy^i)2=(Ey^iy^i)2Di=2EY0(Yi0f(xi))(f(xi)Ey^i)Ei=2EY0(Yi0f(xi))(Ey^iy^i)Fi=2EY0(f(xi)Ey^i)(Ey^iy^i)=2(f(xi)Ey^i)(Ey^iy^i)

Similarly for err we have

yif^(xi)=yif(xi)+f(xi)Ey^i+Ey^iy^i

and

err=1Ni=1N(yif(xi)+f(xi)Ey^i+Ey^iy^i)2=1Ni=1NGi+Bi+Ci+Hi+Ji+Fi,

where

Gi=(yif(xi))2Hi=2(yif(xi))(f(xi)Ey^i)Ji=2(yif(xi))(Ey^iy^i).

Therefore, we have

Ey(op)=Ey(Errinerr)=1Ni=1NEy[(AiGi)+(DiHi)+(EiJi)].

For Ai and Gi, the expectaion over y captures unpredictable error and thus Ey(AiGi)=0. Similarly we have EyDi=EyHi=EyEi=0, and thus

Ey(op)=2Ni=1NJi=2Ni=1NEy(yif(xi))(Ey^iy^i)=2Ni=1N[Ey(yiy^i)EyyiEyy^i]=2Cov(yi,y^i).