Definition

Red-teaming is a planned, authorized attack on your own systems, staff, or AI, run to expose weak spots before a real adversary finds them.

At a glance

A friendly attack you commission on yourself, meant to find blind spots, not cause harm.
The name comes from military war games: the ‘red team’ plays the enemy against the defending ‘blue team’^[3].
It tests your whole organization, including people and procedures, often in stealth so staff don’t know.
AI red-teaming applies the same idea to chatbots and assistants.

How it works

A trusted group is authorized to behave like a real adversary, attacking your systems, staff, and procedures to surface problems you can’t see from inside^[2]. The U.S. formalized this during the Cold War with RAND simulations, naming the attacker ‘red’ after the Soviet Union.

Red team vs. a basic security test

A penetration test is narrow and known: testers check one website or network, with your IT team watching. Red-teaming is wider and quieter; no path is off the table, including tricking employees, and your staff are often kept in the dark^[4]. Smaller businesses usually start with pen testing, then graduate to red-teaming.

Why it matters now: AI

Testers deliberately try to manipulate AI tools, using ‘jailbreaks’ or hidden ‘prompt injection,’ to see if they leak data or behave unsafely^[1]. Because AI fails in unpredictable ways, red-teaming it before launch finds those failures first, not in a headline^[5].

Bottom line

A friendly attack you commission on yourself, so a real adversary never gets the first try.

What is red-teaming?

At a glance

How it works

Red team vs. a basic security test

Why it matters now: AI

Bottom line

References