What are the differences between weight penalties and weight constraints? What are the advantages and drawbacks?

When using a weights penalty, you inclure an new term representing the penalty in the cost function you optimize. The cost to obtimize thus becomes : $Cost + \lambda ||\Theta||^2 $.

When using a weight constraint, you optimize the same cost as before but such that the constraint $ ||\Theta||^2 < C $ is respected.

Weight penalty and weight constraints are somewhat equivalent because for every $\lambda$ value used with the penalty you can find a value of C in the weight constraint that for which the optimal solution will be the same. However, they differ in the fact that weight penalty starts pushing down on the weight norms as soon as training starts while weight constraints will have no effect on the weights as long as their norm is smaller than C. This sort of gives the weights a 'grace' period where they can evolve more freely.

