On the Origins of Algorithmic Progress in AI

January 8, 2026

The world’s best AI researchers have been paid millions of dollars for their expertise. How much do their research breakthroughs impact AI progress compared to building bigger, more powerful datacenters? Our new paper titled On the Origins of Algorithmic Progress in AI suggests that the literature overestimates the role of algorithmic breakthroughs and underestimates the role of increased computing resources.

The sources of progress in AI capabilities are typically broken down into three distinct components:

Compute scaling counts the raw number of mathematical operations, or FLOPs, performed during model training. More FLOPs mean better performance. You can perform more total operations when you (1) have more total hardware or (2) run that hardware for longer.
Hardware efficiency quantifies the costs of each operation in monetary, energy, or time units. Better hardware efficiency implies more FLOPs can be performed for the same number of dollars, joules, or days.
Algorithmic efficiency encompasses the cleverness of the training procedure, quantifying the performance “bang” for each FLOP “buck.” Better algorithmic efficiency means you can achieve the same model performance while using fewer operations.

Typically, each of these components is treated as orthogonal to the others, such that improvements can be evaluated independently between sources of progress. For example, hardware efficiency (in FLOPs per dollar) improved 45x between 2013 and 2024. Because compute scale and hardware efficiency are considered independent, you can expect your hardware efficiency gains to be roughly that same 45x regardless of whether you have 10 GPUs or 10,000.1

Traditionally, algorithmic efficiency is also thought to be independent of compute scale. Whether you have 10¹⁵ FLOPs or 10²⁵ FLOPs, the literature assumes that when researchers discover better training techniques, applying these algorithmic improvements should always yield the same multiplicative efficiency improvements – regardless of compute scale (e.g. 10x less compute at 10¹⁵ FLOPs and 10²⁵ FLOPs alike). But is this a safe assumption?

Read the full Substack post...