Maksym Andriushchenko

Paper_sgd_sparse_features

October 12, 2022

2022

Our paper SGD with large step sizes learns sparse features is available online! TL;DR: loss stabilization achieved via SGD with large step sizes leads to a hidden dynamics that promotes sparse feature learning. Also see this twitter thread for a quick summary of the main ideas.

Summary