Definition
The Chinchilla result is a 2022 DeepMind finding that, for a fixed training budget, AI models perform best when size and training data grow together, roughly 20 units of data per parameter.
At a glance
- For a fixed budget, scale model size and training data together, not just size[1].
- Rule of thumb: about 20 words of training data per model parameter[4].
- It showed the industry had been building models too big and feeding them too little.
How it works
DeepMind built Chinchilla (70 billion parameters, 1.4 trillion words) and pitted it against Gopher, four times larger but trained on far less data[3]. On the same budget, the smaller Chinchilla won, and beat GPT-3 across many tests[2]. Better-fed beat bigger.
Why it matters
A smaller model that performs as well costs less every time it answers, lowering ongoing AI costs. This is why many capable modern models are compact rather than enormous: data, not raw size, drives value.
Bottom line
For any given budget, balance size and data rather than chasing the biggest model.