What are dangerous capability evaluations?

Q: What are dangerous capability evaluations?

Published June 1, 2026 · 4 min read

Definition

A structured test of the most harm a powerful AI could do if pushed to its limit, used to decide whether it is safe to release.

At a glance

Measures the model’s maximum ability, not its average behavior — testers push it to do its worst.
Focuses on high-stakes harms: CBRN weapons, offensive cyber, AI self-improvement, and persuasion.
Acts as a release gate: cross a threshold and the model ships only once safeguards are proven.
Now formal policy at Anthropic, OpenAI, and Google DeepMind.

How it works

Instead of asking how a model usually behaves, testers ask what harm a determined bad actor could extract from it. They give it tools, let it reason in steps, and sample many attempts to draw out its true ceiling^[2]. A 2024 Google DeepMind study grouped the dangers into persuasion, cyber-security, self-proliferation, and self-reasoning^[1]; industry frameworks add CBRN weapon uplift^[4].

How results are used

Each lab sets capability thresholds (Anthropic calls its tiers AI Safety Levels). Cross one, and the model is not released until stronger safeguards are shown to cut the risk^[3]. The evaluation decides whether a model ships, ships with guardrails, or stays locked down.

Why it matters

This is the AI industry’s closest thing to a pre-market safety inspection. For a business, a vendor’s published safety framework and dangerous-capability testing are a practical signal that someone is managing risks that could otherwise land on you.

Bottom line

These tests probe an AI’s worst-case potential before launch — a published one is a quick sign your vendor checked the ceiling of risk first.

References

Evaluating Frontier Models for Dangerous Capabilities — Mary Phuong, Matthew Aitchison, et al. (Google DeepMind). arXiv arxiv.org
Dangerous Capability Evaluations — AI Safety Atlas. AI Safety Atlas ai-safety-atlas.com
Anthropic's Responsible Scaling Policy — Anthropic. Anthropic www.anthropic.com
Frontier Capability Assessments — Frontier Model Forum. Frontier Model Forum www.frontiermodelforum.org

Comments

Questions, corrections, and links welcome. Be specific and civil.

Loading comments…