RBM as abstractors

Unsupervised pre-training by stacking RBMs followed by supervised fine-tuning (backpropagation) is better than just backpropagation. One could think that pipelining a data distribution into a stack of RBMs would loose more and more of the original distribution, as when pipelined into a stack of random weight matrices. Yet it manages to translate the original distribution into another that is more suitable for learning on the final MLP layer. Is this because RBMs learn are a kind of abstraction of the input distribution, thus facilitating generalization?


1 Response to “RBM as abstractors”

  1. 1 Caglar Gulcehre April 4, 2013 at 06:33

    There are several empirical experiments that shows the benefits of using unsupervised pretraining in terms of generalization power. For example see:

    Bengio, Yoshua, et al. “Greedy layer-wise training of deep networks.” Advances in neural information processing systems 19 (2007): 153.

    Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh. “A fast learning algorithm for deep belief nets.” Neural computation 18.7 (2006): 1527-1554.

    The main idea is as you go deeper you are more likely to learn more abstract features. For example see the filters in that paper:

    Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations.
    Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng.

    Ideally in a deep network, lower layers of the will represent the lower level features, such as edges (just like animal’s visual cortex. We know that visual cortex organized hierarchically and lower layers tend to be responsible for the lower level features), but as you go deeper you’ll see that more abstract features tend to emerge in your filters. e.g: faces.

    Yes you might be losing some information about the training data as you go deeper. But if you want to generalize well(and not to overfit) you’d need to forget some information about the training dataset and that is not very harmful in most of the situations. See bias vs variance tradeoff and that’s why unsupervised pretraining acts like a regularizer:

    Erhan, Dumitru, et al. “Why does unsupervised pre-training help deep learning?.” The Journal of Machine Learning Research 11 (2010): 625-660.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: