Definition

A Transformer is an AI architecture that reads an entire piece of text at once using an “attention” mechanism, replacing older RNNs that had to read it one word at a time.

At a glance

RNNs read text word-by-word in order, which made training slow.
Transformers read the whole passage at once, so work splits across many chips.
Self-attention lets every word weigh every other word, keeping long-range context.
This parallel design made today’s large models, like ChatGPT, practical.

How it works

An RNN reads in order, carrying a running memory from each word to the next, so it must process The cat before it can understand sat^[3] — slow, and forgetful over long documents^[4]. A Transformer instead uses self-attention: every word looks at every other word at once^[2], so the math spreads across many processors in parallel^[1].

Why it matters

Parallel training means companies can build far larger, more capable models in reasonable time and cost. That one change unlocked chatbots, drafting tools, translation, and summarization good enough for work. Any “large language model” runs on the Transformer design, not the older RNN.

Bottom line

Stop reading word by word and read everything at once — that is what made today’s AI tools possible.

Transformers vs RNNs: what changed?

At a glance

How it works

Why it matters

Bottom line

References