Alex’s CNN structure


In Alex Krizhevsky’s CNN, the first and third layer are full connected Convolutional  layers. Why first and third, what if we choose other layers: for example: second and fourth.


Why use a network has big capacity than the training set(easily over-fitting). What happens if use a slightly smaller network.




  1. 1 Sina Honari March 11, 2013 at 15:28

    Q1- Most probably these configurations have give him the best results. Meanwhile, he was trying to use the biggest model he could on his machine while having minimum connection in between the two GPUs.

    Q2-Using a bigger network gives better generalization power to the model at the cost of adding the capacity, however, the side-affects can be curbed by using correct regularization techniques like drop-outs, early stopping, weight-decay, etc.

