Reconciling grokking with statistical learning theory through the lens of norm- and stability-based generalization bounds

IRIS

In recent years, Artificial Intelligence, particularly Machine Learning, has achieved remarkable success in solving complex problems. However, this progress has also revealed the emergence of unexpected, poorly understood, and elusive phenomena that characterize the behavior of machine intelligence and learning processes. These phenomena often challenge researchers to interpret them within the boundaries of existing Machine Learning theoretical frameworks, thereby motivating the development of new and more comprehensive theoretical foundations. One such phenomenon, known as grokking, refers to the sudden and substantial improvement in a model's performance following a prolonged period of stagnant or even regressive learning. In this paper, we argue that it is possible to provide insights into grokking by leveraging the existing theoretical foundations of Machine Learning, in particular concepts from Statistical Learning Theory, such as norm-based and stability-based generalization bounds. We further show how these theories can help reconcile the phenomenon of grokking with established principles of learning and generalization. Furthermore, we demonstrate the practical applicability of these insights through concrete examples.

Reconciling grokking with statistical learning theory through the lens of norm- and stability-based generalization bounds

Oneto L.;Ridella S.;Minisi S.;Coraddu A.;Anguita D.

2026-01-01

Abstract

In recent years, Artificial Intelligence, particularly Machine Learning, has achieved remarkable success in solving complex problems. However, this progress has also revealed the emergence of unexpected, poorly understood, and elusive phenomena that characterize the behavior of machine intelligence and learning processes. These phenomena often challenge researchers to interpret them within the boundaries of existing Machine Learning theoretical frameworks, thereby motivating the development of new and more comprehensive theoretical foundations. One such phenomenon, known as grokking, refers to the sudden and substantial improvement in a model's performance following a prolonged period of stagnant or even regressive learning. In this paper, we argue that it is possible to provide insights into grokking by leveraging the existing theoretical foundations of Machine Learning, in particular concepts from Statistical Learning Theory, such as norm-based and stability-based generalization bounds. We further show how these theories can help reconcile the phenomenon of grokking with established principles of learning and generalization. Furthermore, we demonstrate the practical applicability of these insights through concrete examples.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2026

Appare nelle tipologie:

01.01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
J096 - NEUCOM.pdf accesso chiuso Tipologia: Documento in Post-print Dimensione 3.63 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	3.63 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/1297268

Citazioni

ND

0

0

social impact