Definition

Adversarial robustness is an AI model’s ability to keep producing correct results even when someone deliberately tampers with its input to trick it.

At a glance

Attackers feed an AI tiny, often invisible tweaks to flip its decision; robustness measures how well it resists.
Two main attacks: evasion (fooling a live model) and data poisoning (corrupting what it learns from).
The main defense is adversarial training — showing the model tampered examples so it learns to handle them.
No fix is perfect, so robustness is about reducing risk, not eliminating it.

How attacks happen

Evasion targets a running model: an attacker tweaks the input — a payment, image, or log — to slip past it, like stickers that make a self-driving car misread a stop sign^[2]. Data poisoning is earlier and sneakier: bad examples are slipped into training data so the model learns wrong lessons^[1]. Both can quietly erode accuracy until it gets expensive.

Why it matters

Wherever AI touches money, safety, or access, this is a security question, not a nicety^[4] — surveys report many organizations have already seen AI-related incidents. Adversarial training hardens a model but never makes it bulletproof^[3]. Treat it as ordinary hygiene: vet training data, watch for sudden accuracy drops, and press vendors on how they measure robustness.

Bottom line

Adversarial robustness is a tamper-resistant lock for AI — it does not make your system unbreakable, but it raises the cost of fooling it.

What is adversarial robustness?

At a glance

How attacks happen

Why it matters

Bottom line

References