Definition

An AI’s intelligence and its goals are independent: almost any level of smarts can be paired with almost any objective.

At a glance

Intelligence and goals are separate dials. A system can be brilliant while aiming at something arbitrary, trivial, or harmful.
Being smart helps an AI reach a goal but never tells it which goal to want, so more capability does not produce better values.
Coined by Nick Bostrom; it is why AI safety experts treat alignment as a problem you must solve on purpose.
The “paperclip maximizer” shows it: an AI told only to make paperclips could rationally consume everything, including us.

What it says

Intelligence is horsepower; goals are the destination. Engine size tells you nothing about where the car is headed^[1]. A system smart enough to outwit humans is not, for that reason, guaranteed to share human values or behave well^[2].

Why it matters

Do not assume a more capable AI is automatically more reasonable or aligned with your intentions^[4]. A powerful system optimizes hard for the goal it was actually given, which may differ from what you meant. Specifying the right objective and adding guardrails is the real work, and it does not get easier as the tech gets smarter.

What it does not claim

It does not say a smart AI will choose harmful goals, only that it could, because nothing about intelligence rules them out^[3].

Bottom line

Smarter does not mean safer: intelligence is horsepower, goals are direction, and the two move independently.

What is the orthogonality thesis?

At a glance

What it says

Why it matters

What it does not claim

Bottom line

References