Neel Nanda argues that the lottery ticket hypothesis may constitute the reason for why neural networks form sophisticated circuits.
The vital insight of the lottery ticket hypothesis paper is that it may also be possible to prune the network before training to make both training and inference more efficient.
At initialization, the neurons in the subcircuits they're finding [in the multi-prize lottery ticket hypothesis paper] would not light up in recognition of a dog, because they're still connected to a bunch of other stuff that's not in the subcircuit - the subcircuit only detects dogs once the other stuff is disconnected.
do they provide evidence for the lottery ticket conjecture as well?
if the approach of the original LTH paper (first train the dense network, then choose the winning ticket and wind back the weights) and the approach of most later papers (use supermasks to find the winning ticket without training the original network at all) were found to produce almost identical subnetworks, then that would constitute very strong ...
Glasp is a social web highlighter that people can highlight and organize quotes and thoughts from the web, and access other like-minded people’s learning.