technicals

What is MMLU?

June 1, 2026 · 4 min read

MMLU · MASSIVE MULTITASK BENCHMARKOne test, 57 subject booklets.The score is just the share of questions answered right.MathematicsLawMedicineHistory57 subjectsSCORE% right25% = random-guess floor (4-choice questions)Below 25% is worse than guessing; the booklets cover 57 fields, from math to law to medicine.

Definition

MMLU is a standardized AI exam of about 16,000 multiple-choice questions across 57 subjects that scores how broadly knowledgeable a model is.

At a glance

Why it matters

A higher MMLU score is shorthand for broad competence across many fields, so vendors quote it heavily (the dataset has 100M+ downloads)[1][4]. For buyers comparing tools like OpenAI, Anthropic, and Google, it is a useful first filter on general knowledge[3].

What it does not tell you

MMLU only tests book knowledge. It says nothing about brand voice, your documents, made-up answers, or cost and speed at scale. A model can ace it and still fumble your customer emails.

Bottom line

Treat MMLU as a quick report card for general knowledge, not the final word; the model that wins on your own tasks is the one worth paying for.

Connects to Computer ScienceEconomics

References

  1. MMLU. Wikipedia en.wikipedia.org
  2. Measuring Massive Multitask Language Understanding — Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, Jacob Steinhardt. arXiv / ICLR 2021 arxiv.org
  3. What is MMLU? LLM Benchmark Explained and Why It Matters. DataCamp www.datacamp.com
  4. MMLU Benchmark (Massive Multi-task Language Understanding). Klu klu.ai