Definition

A company’s own public promise to raise its AI safety bar as its models get more powerful, and not to release one until the worst-case risks are proven low enough.^[1]

At a glance

Voluntary and self-imposed: the company writes and publishes the rules, not a government regulator.
Works in tiers called AI Safety Levels (ASL), loosely modeled on lab biosafety levels. Today’s frontier models sit at ASL-2; tougher ASL-3 measures went live in May 2025.^[3]
Anthropic coined the term in 2023; OpenAI and Google DeepMind run parallel frameworks.^[4]
Not a guarantee: critics say the rules are non-binding and the company can loosen them.^[5]

How it works

Each tier is an “if-then” trigger: if a model crosses a dangerous capability threshold (say, meaningfully helping build a bioweapon), then specific safeguards must be in place before it ships or trains further. As capability climbs, the required precautions get stricter. Version 3.0 (Feb 2026) adds a public Frontier Safety Roadmap and regular risk reports with outside expert review.^[2]

Why it matters

These policies decide which AI tools reach the market and how trustworthy their safety claims are. Useful as a signal of a vendor’s seriousness, but not a guarantee. Treat an RSP as one input, and keep your own due diligence.

Bottom line

A real safety discipline, but because it is voluntary and self-graded, it signals seriousness rather than guaranteeing safety.

What is a responsible scaling policy?

At a glance

How it works

Why it matters

Bottom line

References