Loss reduction sum vs mean: when to use each? - PyTorch Forums thumbnail
Loss reduction sum vs mean: when to use each? - PyTorch Forums
discuss.pytorch.org
I think the disadvantage in using the sum reduction would also be that the loss scale (and gradients) depend on the batch size, so you would probably need to change the learning rate based on the batch size. While this is surely possibly, a mean reduction would not make this necessary.
1 Users
0 Comments
1 Highlights
0 Notes

Top Highlights

  • I think the disadvantage in using the sum reduction would also be that the loss scale (and gradients) depend on the batch size, so you would probably need to change the learning rate based on the batch size. While this is surely possibly, a mean reduction would not make this necessary.

Ready to highlight and find good content?

Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.