Definition

The Chinchilla result is a 2022 DeepMind finding that, for a fixed training budget, AI models perform best when size and training data grow together, roughly 20 units of data per parameter.

At a glance

For a fixed budget, scale model size and training data together, not just size^[1].
Rule of thumb: about 20 words of training data per model parameter^[4].
It showed the industry had been building models too big and feeding them too little.

How it works

DeepMind built Chinchilla (70 billion parameters, 1.4 trillion words) and pitted it against Gopher, four times larger but trained on far less data^[3]. On the same budget, the smaller Chinchilla won, and beat GPT-3 across many tests^[2]. Better-fed beat bigger.

Why it matters

A smaller model that performs as well costs less every time it answers, lowering ongoing AI costs. This is why many capable modern models are compact rather than enormous: data, not raw size, drives value.

Bottom line

For any given budget, balance size and data rather than chasing the biggest model.

What is the Chinchilla scaling result?

At a glance

How it works

Why it matters

Bottom line

References