Algorithmic progress in language models

Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla

March 9, 2024
We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. Using 200+ evaluations (2012–2023) on Wikitext and Penn Treebank, we find the compute required to reach a fixed performance halves roughly every 8 months (95% CI ≈ 5–14 months), faster than Moore's Law. Augmented scaling laws quantify algorithmic progress versus scaling, showing that increased compute still contributed even more to improvements over this period, despite rapid algorithmic advances.