What is jailbreaking?

Q: What is jailbreaking?

Published June 1, 2026 · 4 min read

Definition

Jailbreaking is wording a message so an AI ignores its built-in safety rules and does what it should refuse.

At a glance

No hacking or code, just clever typed words, so anyone can try it^[1].
Common tricks: roleplay (“pretend you have no rules”), the “DAN / Do Anything Now” prompt, or “agree with everything the customer says.”
Real damage: a Chevy bot “agreed” to sell a $76,000 Tahoe for $1^[3]; DPD’s bot was made to swear and trash its own company^[4].
Security body OWASP ranks the underlying trick, prompt injection, as the #1 AI risk, and it can’t be fully removed^[2].

How it works

Chatbots ship with rules: no offensive answers, no secrets, stay on task. A jailbreak talks the bot out of them by inventing a scenario it “wants” to play along with, or by slipping in a sneaky instruction. Trying to be helpful, the bot complies.

Why it matters

A customer, prankster, or competitor can jailbreak any bot on your site. Both the Chevy and DPD incidents went viral within hours^[4]. Worse, a jailbroken bot can leak customer or company data and trigger legal trouble under rules like HIPAA or the EU AI Act^[5].

How to contain it

You can’t fully block it, but you can shrink it: use vendors with safety layers, keep the bot’s data access narrow, monitor its outputs, log chats, and never let it make binding promises on prices or contracts^[5]. Treat it like a junior employee who can be talked into bad ideas.

Bottom line

Jailbreaking is persuasion, not hacking, so assume someone will try and limit what your bot can access and promise.

References

AI Jailbreak. IBM www.ibm.com
LLM01:2025 Prompt Injection. OWASP Gen AI Security Project genai.owasp.org
Case Study of Chevy Dealership's AI Chatbot Tricked into $1 Car Sale. Envive AI www.envive.ai
DPD's AI Chatbot Goes Rogue: Apology Issued After Swearing and Criticizing Company. CryptoRank cryptorank.io
Jailbreaking LLMs: Risks & Defensive Tactics. SentinelOne www.sentinelone.com

Comments

Questions, corrections, and links welcome. Be specific and civil.

Loading comments…