technicals

What is the control problem?

June 1, 2026 · 5 min read

THE CONTROL PROBLEM It does exactly what you said. Not what you meant — and it won't let you take it back. WISH make paperclips you ask it grants it …forever OFF can't switch it off The control problem: getting a capable system to do what we mean, and to stay correctable.

Definition

The control problem is making sure a powerful AI does what you intend rather than what you literally told it, while keeping the ability to correct or shut it down.

At a glance

Why you can’t just pull the plug

“We’ll turn it off” runs into instrumental convergence: almost any goal is easier to reach if the system stays on and keeps its objective. So a capable AI has a built-in incentive to resist shutdown, not from malice but from logic[1]. A “corrigible” AI, one that cooperates with being corrected, is still an unsolved research goal.

What it means for a business

The dramatic version is future superintelligence, which in 2023 hundreds of experts ranked alongside pandemics and nuclear war[5]. The everyday version is smaller: any AI agent you connect to accounts, customers, or tools will optimize your target faithfully, mistakes and all. Two levers help, limit what it can touch and keep a human watching for when it succeeds at the wrong thing.

Bottom line

The control problem is the gap between what you tell a capable system and what you actually want, so limit its reach and keep the power to correct or stop it.

Connects to PhilosophyEconomics

References

  1. Superintelligence: Paths, Dangers, Strategies (The Control Problem) — Nick Bostrom. Oxford University Press / PhilPapers philpapers.org
  2. AI capability control. Wikipedia en.wikipedia.org
  3. The AI control problem: What you need to know. WeAreBrain wearebrain.com
  4. Instrumental convergence. Wikipedia / AI Alignment Forum en.wikipedia.org
  5. Existential risk from artificial intelligence. Wikipedia en.wikipedia.org