technicals

What is distillation?

June 1, 2026 · 4 min read

DISTILLATIONThe master teaches the apprentice.Then the apprentice cooks nearly every meal.Masterbig · slow · costlythe recipeApprenticesmall · fast · cheapCustomersA small model learns to mimic a big one — most of the quality, a fraction of the cost.

Definition

Distillation trains a smaller, cheaper AI model to copy a larger one, so it does similar work at lower cost and higher speed.

At a glance

Why it matters

Big models need costly servers and charge per request. A distilled model does similar work cheaper and faster, even on a laptop. The tradeoff: a small quality drop on the hardest tasks.

Where you see it

Vendors sell distilled “mini,” “lite,” or “flash” versions of top models; DeepSeek built competitive models this way[2]. A cheaper provider tier usually means a distilled model.

Bottom line

Distillation gives you most of a big model’s quality at a small model’s price.

Connects to Computer ScienceEconomics

References

  1. What is Knowledge distillation? IBM www.ibm.com
  2. How Distillation Makes AI Models Smaller and Cheaper. Quanta Magazine www.quantamagazine.org
  3. Distilling the Knowledge in a Neural Network — Geoffrey Hinton, Oriol Vinyals, Jeff Dean. arXiv arxiv.org
  4. DistilBERT, a distilled version of BERT smaller faster cheaper and lighter — Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf. arXiv arxiv.org