Algorithmic progress in language models

Anson Ho

Tamay Besiroglu

Ege Erdil

David Owen

Robi Rahman

Zifan Carl Guo

David Atkinson

Neil Thompson

Jaime Sevilla

March 9, 2024
We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. Using 200+ evaluations (2012–2023) on Wikitext and Penn Treebank, we find the compute required to reach a fixed performance halves roughly every 8 months (95% CI ≈ 5–14 months), faster than Moore's Law. Augmented scaling laws quantify algorithmic progress versus scaling, showing that increased compute still contributed even more to improvements over this period, despite rapid algorithmic advances.

Related Publications