Definition
An AI model gets predictably better as you increase three things: its size, its training data, and the computing power used to build it.
At a glance
- Three levers: model size, training data, and compute. Turn all three up in balance and skill reliably improves[1].
- It follows a power law: early spend buys big gains, then the curve flattens into diminishing returns[4].
- Because it is predictable, labs can forecast a model’s quality before paying to build it[3].
- Doubling spend does not double quality.
How it works
Increasing size, data, and compute together raises performance in a steady, measurable way that holds across a huge range of model sizes - so results can be estimated in advance.
Why bigger is not always better
After a point, each extra dollar buys a smaller gain than the last. DeepMind’s 2022 Chinchilla study proved it: a 70B model trained on more data beat a 280B one on the same budget[2]. The rule of thumb - about 20 words of data per parameter.
Bottom line
Don’t ask “how big can we go?” Ask “what is the cheapest model, with the best data, that does the job?”
References
- Scaling Laws for Neural Language Models — Jared Kaplan, Sam McCandlish. OpenAI arxiv.org
- An empirical analysis of compute-optimal large language model training — Jordan Hoffmann. Google DeepMind deepmind.google
- Neural scaling law. Wikipedia en.wikipedia.org
- LLM Scaling Laws Explained - Will Bigger AI Models Always Win. BuildFastWithAI www.buildfastwithai.com
Comments
Questions, corrections, and links welcome. Be specific and civil.