Ex. 11.2

Consider a neural network for a quantitative outcome as in (11.5), using squared-error loss and identity output function \(g_k(t) = t\). Suppose that the weights \(\alpha_m\) from the input to hidden layer are nearly zero. Show that the resulting model is nearly linear in the inputs.

Soln. 11.2

One implicit assumption of this exercise is that the neural network uses sigmoid activation function (actually, any general activation function with the property that it's almost linear near origin). Recall that the sigmoid activation function is

\[\begin{equation} \sigma(\nu) = \frac{1}{1 + e^{-\nu}}.\non \end{equation}\]

Let \(\mu(x) = \frac{1}{2}(1+x)\). Note that

\[\begin{eqnarray} \lim_{x\ra 0}\mu(x)/\sigma(x) &=& \lim_{x\ra 0}\frac{1}{2}(1+x)(1+e^{-x})\non\\ &=&\lim_{x\ra 0}\frac{1}{2}\left(1 + e^{-x} + x + xe^{-x}\right)\non\\ &=&1.\non \end{eqnarray}\]

When \(\alpha_m\) are nearly zero, so are \(\alpha_m^TX\). Note that if we add bias term into and enlarge \(X\) (that is, each row of \(X\) has 1 in the first position), then \(Z_m = \sigma(\alpha_m^TX) \sim \frac{1}{2}(1 + \alpha_m^TX)\). Therefore the resulting model is nearly linear in \(X\), since \(g_k\) is identity.