Sapiens
Research

What is the ARC-AGI benchmark?

Published June 1, 2026 · 5 min read

ARC-AGI · WHAT IT MEASURESA pop quiz, not an open-book exam.Other tests reward memorized answers; ARC-AGI rewards reasoning it out.MOST AI TESTS · OPEN BOOKmemorized knowledgeeasy testARC-AGI · BRAND-NEW PUZZLE?two workedexamplesreason itout nowNo two puzzles repeat, so you can't have studied — you must figure out the rule from a couple of hints.

Definition

ARC-AGI is a benchmark of small colored-grid puzzles that tests whether an AI can figure out brand-new rules from a few examples instead of relying on memorized data.

At a glance

  • Each puzzle shows a few input-output grids; the AI must infer the hidden rule and apply it - something most people do easily.
  • It measures on-the-fly reasoning, not the fact-recall most AI benchmarks reward.
  • ARC-AGI-2 (March 2025) is far harder for machines: average humans score ~60%, top AI under 5%.
  • A $1M annual ARC Prize exists; the $700K grand prize unlocks only above 85% and stays unclaimed.

What it tests

You see two or three examples of a grid transforming, then must produce the output for a fresh input. Each puzzle uses a different hidden rule with only a few examples[1], so it rewards genuine reasoning over memorization - a closer proxy for general intelligence than tests an AI can ace by reading the whole internet[2].

Why it matters

A big jump signals real progress: OpenAI’s o3 hit 75.7% (up to 87.5% with heavy compute) on ARC-AGI-1 in late 2024[3]. But the same model fell to roughly 3% on the harder ARC-AGI-2 - a reality check that AI still struggles with truly novel problems, useful when judging vendor claims[4].

The scoreboard

The non-profit ARC Prize Foundation runs a yearly Kaggle contest with a strict compute cap to block brute force[5]. The best 2025 entry reached only ~24%, so the $700K grand prize stays unclaimed.

Bottom line

Watch ARC-AGI scores as a grounded signal of whether AI can reason on the fly - and treat the unclaimed grand prize as proof human-level reasoning has not arrived.

References

  1. What is ARC-AGI? — ARC Prize Foundation ARC Prize Foundation arcprize.org
  2. On the Measure of Intelligence — Francois Chollet. arXiv arxiv.org
  3. OpenAI o3 Breakthrough High Score on ARC-AGI-Pub — ARC Prize Foundation. ARC Prize Foundation arcprize.org
  4. ARC-AGI-2 A New Challenge for Frontier AI Reasoning Systems — Francois Chollet, ARC Prize team. arXiv arxiv.org
  5. Announcing ARC-AGI-2 and ARC Prize 2025 — ARC Prize Foundation. ARC Prize Foundation arcprize.org

Comments

Questions, corrections, and links welcome. Be specific and civil.

  • Loading comments…