Definition
A company’s own public promise to raise its AI safety bar as its models get more powerful, and not to release one until the worst-case risks are proven low enough.[1]
At a glance
- Voluntary and self-imposed: the company writes and publishes the rules, not a government regulator.
- Works in tiers called AI Safety Levels (ASL), loosely modeled on lab biosafety levels. Today’s frontier models sit at ASL-2; tougher ASL-3 measures went live in May 2025.[3]
- Anthropic coined the term in 2023; OpenAI and Google DeepMind run parallel frameworks.[4]
- Not a guarantee: critics say the rules are non-binding and the company can loosen them.[5]
How it works
Each tier is an “if-then” trigger: if a model crosses a dangerous capability threshold (say, meaningfully helping build a bioweapon), then specific safeguards must be in place before it ships or trains further. As capability climbs, the required precautions get stricter. Version 3.0 (Feb 2026) adds a public Frontier Safety Roadmap and regular risk reports with outside expert review.[2]
Why it matters
These policies decide which AI tools reach the market and how trustworthy their safety claims are. Useful as a signal of a vendor’s seriousness, but not a guarantee. Treat an RSP as one input, and keep your own due diligence.
Bottom line
A real safety discipline, but because it is voluntary and self-graded, it signals seriousness rather than guaranteeing safety.
References
- Anthropic's Responsible Scaling Policy — Anthropic. Anthropic www.anthropic.com
- Responsible Scaling Policy Version 3.0 — Anthropic. Anthropic www.anthropic.com
- Activating AI Safety Level 3 protections — Anthropic. Anthropic www.anthropic.com
- Common Elements of Frontier AI Safety Policies — METR. METR metr.org
- How Anthropic's AI Safety Framework Misses the Mark — The Midas Project. The Midas Project www.themidasproject.com
Comments
Questions, corrections, and links welcome. Be specific and civil.