Dropout in convolutional layers and ReLU VS tanh

Q1. In the paper ImageNet Classification with Deep Convolutional Neural Networks, I noticed that dropout is not used in any convolution layer. Out of the 8 layers of the network, the first 5 being convolutional and the final 3 being fully-connected, dropout is only used in the 2 first fully-connected layers. Either the authors did not try it (and therefore probably have a reason for not trying it) or they did but the results were not as good as not trying it. Can you think of any reasons why using dropout in a convolution layer might be counter indicated?

Q2. In the same paper, in section 3.1, there is a graph comparing the training speed with Rectified Linear Units to tanh which tends to show that ReLU learn much faster than tanh. However, they only investigate performance after a few epochs of training. The graph does not show which one, after much more training, gives the best performance. Did anyone ever investigate this question?



1 Response to “Dropout in convolutional layers and ReLU VS tanh”

  1. 1 zhaoyangyang March 14, 2013 at 01:03

    Q1 The dropout with make the training time much longer, if applied at each layer, it might be too long to train.

    Q2 In most of cases, ReLU has better final error rate and trained faster, but in some case this might not be true.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: