Q1: At page 12, For example, some input features may be 0 most of the time while others are non-zero frequently. In that case, there are fewer examples that inform the model about that rarely active input feature, and the corresponding parameters (weights outgoing from the corresponding input units) should be more regularized than the parameters associated with frequently observed inputs. In practice how to regulariz some parameters more than others in the same layer ?

Q2: How much influence can the choice of random seed makes on the result of learning algorithms

### Like this:

Like Loading...

*Related*

In response to Q1 : For rarely seen features, the input of those features will be zero most of the time. That means that the weights corresponding to this feature will have zero gradient for the cost function most of the time. Since your objective function is : Sum(cost function) + regularization, the sum will be almost zero and the dominant term will be the regularization. Therefore these weights will be more regularized than others naturally.

In response to Q2 : for convex problems, the random seed has no effect on training, and we can set it deterministically. For complex optimization tasks, such as in recurrent neural networks, the random seed can have a big impact on the solution we arrive at (or whether we converge towards a solution at all). In the case where unsupervised pretraining is used, the whole initialization procedure (not just the random seed) needs to be carefully designed.