Definition

An AI model gets predictably better as you increase three things: its size, its training data, and the computing power used to build it.

At a glance

Three levers: model size, training data, and compute. Turn all three up in balance and skill reliably improves^[1].
It follows a power law: early spend buys big gains, then the curve flattens into diminishing returns^[4].
Because it is predictable, labs can forecast a model’s quality before paying to build it^[3].
Doubling spend does not double quality.

How it works

Increasing size, data, and compute together raises performance in a steady, measurable way that holds across a huge range of model sizes - so results can be estimated in advance.

Why bigger is not always better

After a point, each extra dollar buys a smaller gain than the last. DeepMind’s 2022 Chinchilla study proved it: a 70B model trained on more data beat a 280B one on the same budget^[2]. The rule of thumb - about 20 words of data per parameter.

Bottom line

Don’t ask “how big can we go?” Ask “what is the cheapest model, with the best data, that does the job?”

What are scaling laws?

At a glance

How it works

Why bigger is not always better

Bottom line

References