Investigating Grokking Phenomena: Neural Network Learning Dynamics through Sparse Crosscoders

Masterarbeit, Projekt, Bachelorarbeit

Grokking is a fascinating phenomenon in deep learning whereneural networks initially memorize training data before suddenly generalizingto unseen examples after extended training. This behavior, characterized bya significant gap between training and validation performance that unexpectedlycloses after many optimization steps, challenges our understanding of generalizationin neural networks. This thesis aims to investigate the underlying mechanismsof grokking using interpretable neural network architectures such as SparseCrosscoders. By leveraging these recently developed interpretable models,we can gain insights into how neural networks transition from memorizationto genuine understanding of the underlying patterns, particularly in algorithmictasks such as modular arithmetic.