Definition
Jailbreaking is wording a message so an AI ignores its built-in safety rules and does what it should refuse.
At a glance
- No hacking or code, just clever typed words, so anyone can try it[1].
- Common tricks: roleplay (“pretend you have no rules”), the “DAN / Do Anything Now” prompt, or “agree with everything the customer says.”
- Real damage: a Chevy bot “agreed” to sell a $76,000 Tahoe for $1[3]; DPD’s bot was made to swear and trash its own company[4].
- Security body OWASP ranks the underlying trick, prompt injection, as the #1 AI risk, and it can’t be fully removed[2].
How it works
Chatbots ship with rules: no offensive answers, no secrets, stay on task. A jailbreak talks the bot out of them by inventing a scenario it “wants” to play along with, or by slipping in a sneaky instruction. Trying to be helpful, the bot complies.
Why it matters
A customer, prankster, or competitor can jailbreak any bot on your site. Both the Chevy and DPD incidents went viral within hours[4]. Worse, a jailbroken bot can leak customer or company data and trigger legal trouble under rules like HIPAA or the EU AI Act[5].
How to contain it
You can’t fully block it, but you can shrink it: use vendors with safety layers, keep the bot’s data access narrow, monitor its outputs, log chats, and never let it make binding promises on prices or contracts[5]. Treat it like a junior employee who can be talked into bad ideas.
Bottom line
Jailbreaking is persuasion, not hacking, so assume someone will try and limit what your bot can access and promise.
References
- AI Jailbreak. IBM www.ibm.com
- LLM01:2025 Prompt Injection. OWASP Gen AI Security Project genai.owasp.org
- Case Study of Chevy Dealership's AI Chatbot Tricked into $1 Car Sale. Envive AI www.envive.ai
- DPD's AI Chatbot Goes Rogue: Apology Issued After Swearing and Criticizing Company. CryptoRank cryptorank.io
- Jailbreaking LLMs: Risks & Defensive Tactics. SentinelOne www.sentinelone.com
Comments
Questions, corrections, and links welcome. Be specific and civil.