Reuse idea and Error measurements


Give some ways to exploit the reuse idea described in section 5.3 and explain why it is interesting in the context of learning with symbolic data.


In the video 4c Another diversion: The softmax output function, Hinton says that with squared error measure we are depriving the network from the knowledge that the outputs should sum to 1. But isn’t it fixed by the softmax function more than by the Cross-Entropy function? Wouldn’t this knowledge (sum to 1) be present if one uses squared error measure on a softmax output function? (even if it makes less sense to do so)

Beside this, what are the drawbacks of the squared error measure compared to cross-entropy?


1 Response to “Reuse idea and Error measurements”

  1. 1 Magatte Diagne February 11, 2013 at 13:21

    Q1 :

    The units of the first hidden layer can have the same incoming weights.

    Q2 :

    For the squared error the saturation of neurons can cause problem. Because d / da (sigmoid(a) – y) ^2 = (p – a) * p * (1 – p) if p is near 0 or 1 the derivative is really small.
    But if we use a log prob, we’ll still be able to propagate the gradient.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: