research

What is the Chinchilla scaling result?

June 1, 2026 · 4 min read

CHINCHILLA SCALINGBalance beats piling on.For a fixed compute budget, scale size and data together.Model SizeTraining Datalevel = best resultTip it either way — too big, too little data — and you waste the same budget.

Definition

The Chinchilla result is a 2022 DeepMind finding that, for a fixed training budget, AI models perform best when size and training data grow together, roughly 20 units of data per parameter.

At a glance

How it works

DeepMind built Chinchilla (70 billion parameters, 1.4 trillion words) and pitted it against Gopher, four times larger but trained on far less data[3]. On the same budget, the smaller Chinchilla won, and beat GPT-3 across many tests[2]. Better-fed beat bigger.

Why it matters

A smaller model that performs as well costs less every time it answers, lowering ongoing AI costs. This is why many capable modern models are compact rather than enormous: data, not raw size, drives value.

Bottom line

For any given budget, balance size and data rather than chasing the biggest model.

Connects to Computer ScienceEconomics

References

  1. Training Compute-Optimal Large Language Models — Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch. DeepMind arxiv.org
  2. An empirical analysis of compute-optimal large language model training — Google DeepMind. Google DeepMind deepmind.google
  3. Chinchilla (language model). Wikipedia en.wikipedia.org
  4. Chinchilla Scaling, Compute-Optimal Training and the 20-Token-Per-Parameter Rule. AI Tower ai.towerofrecords.com