Investigating Grokking Phenomena

Neural Network Learning Dynamics through Interpretable Models

Masterarbeit, Studienarbeit, Bachelorarbeit

Grokking is a fascinating phenomenon in deep learning where neural networks initially memorize training data before suddenly generalizing to unseen examples after extended training. This behavior, characterized by a significant gap between training and validation performance that unexpectedly closes after many optimization steps, challenges our understanding of generalization in neural networks. This thesis aims to investigate the underlying mechanisms of grokking using interpretable neural network architectures such as Sparse Crosscoders. By leveraging these recently developed interpretable models, we can gain insights into how neural networks transition from memorization to genuine understanding of the underlying patterns, particularly in algorithmic tasks such as modular arithmetic.