What is a transformer?

Q: What is a transformer?

Published June 1, 2026 · 4 min read

Definition

A transformer is the type of AI behind today’s language tools, which reads a whole passage at once and lets every word weigh every other word to grasp meaning.

At a glance

The engine under ChatGPT, Claude, Gemini, Copilot, and most image and voice models — one 2017 invention.
Its trick is ‘attention’: it reads the whole input at once and lets each word check which other words matter for context.
Doubling input length roughly quadruples the work, so longer documents and bigger context windows cost more.
You rent it through an API or product; you never build one yourself.

How it works

Older AI read word by word and forgot the start by the end. The 2017 paper ‘Attention Is All You Need’ changed that^[1]. The transformer reads the whole passage at once, and attention lets each word look at every other word to settle its meaning^[2] — so ‘mole’ resolves to animal, chemistry unit, or skin spot from its neighbors.

Why it took over

It processes input in parallel, so it trains fast and scales huge^[1]. And it is general: the same design handles text, code, images, and audio^[3]. That is why one architecture now underpins nearly every ‘large language model’ or ‘foundation model’ you hear about^[5].

What it means for you

Cost grows steeply with length — twice the text, about four times the computation^[4] — so send the model only what it needs. And it predicts likely text, not checked facts, so it can be fluent and wrong. Use it for drafts and summaries with a human in the loop; don’t hand it final authority over legal, medical, or financial calls.

Bottom line

You rent this capability, you pay more as inputs grow, and you treat its confident output as a smart draft to verify.

References

Attention Is All You Need — Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. arXiv (Google Brain / Google Research) arxiv.org
Attention in transformers, step-by-step (Deep Learning, chapter 6) — Grant Sanderson. 3Blue1Brown www.3blue1brown.com
What is a Transformer Model? IBM www.ibm.com
Transformers and Attention: How LLMs Actually Process Text — Q. V. Fagundes. DEV Community dev.to
Attention Is All You Need. Wikipedia en.wikipedia.org

Comments

Questions, corrections, and links welcome. Be specific and civil.

Loading comments…