What is the appropriate way to implement a differentiable variant of L0 regularizer (count the non-zero values in a Conv layer / matrix) to keras layer?
I was thinking of using r(x) = tanh(abs(f*x))
for x the matrix, where f is some factor that scales the range in which the result will be considered small (and tan(0.5) ~ 0.55, so for f=1 : r(0.55)=0.5).
- The loss + regularizer result will be simply added for the training loop?
- Is the result of the regularizer function divided by the size of the matrix (to get a mean, like the loss function)? I have not seen that in the example L1 or L2 class, there it is the sum. I want to have a function independent of layer size.
What is the appropriate way to implement a differentiable variant of L0 regularizer (count the non-zero values in a Conv layer / matrix) to keras layer?
I was thinking of using r(x) = tanh(abs(f*x))
for x the matrix, where f is some factor that scales the range in which the result will be considered small (and tan(0.5) ~ 0.55, so for f=1 : r(0.55)=0.5).
- The loss + regularizer result will be simply added for the training loop?
- Is the result of the regularizer function divided by the size of the matrix (to get a mean, like the loss function)? I have not seen that in the example L1 or L2 class, there it is the sum. I want to have a function independent of layer size.
https://keras.io/api/layers/regularizers/#creating-custom-regularizers
Share Improve this question asked Mar 19 at 16:00 pas-calcpas-calc 14510 bronze badges1 Answer
Reset to default 0A possibility is to use this cost function L_a(x) = 1/(1+(a/x)²)
which at x=a
is 0.5, L_a(a)=0.5
. (a
is a threshold parameter)