Definition
A transformer is the type of AI behind today’s language tools, which reads a whole passage at once and lets every word weigh every other word to grasp meaning.
At a glance
- The engine under ChatGPT, Claude, Gemini, Copilot, and most image and voice models — one 2017 invention.
- Its trick is ‘attention’: it reads the whole input at once and lets each word check which other words matter for context.
- Doubling input length roughly quadruples the work, so longer documents and bigger context windows cost more.
- You rent it through an API or product; you never build one yourself.
How it works
Older AI read word by word and forgot the start by the end. The 2017 paper ‘Attention Is All You Need’ changed that[1]. The transformer reads the whole passage at once, and attention lets each word look at every other word to settle its meaning[2] — so ‘mole’ resolves to animal, chemistry unit, or skin spot from its neighbors.
Why it took over
It processes input in parallel, so it trains fast and scales huge[1]. And it is general: the same design handles text, code, images, and audio[3]. That is why one architecture now underpins nearly every ‘large language model’ or ‘foundation model’ you hear about[5].
What it means for you
Cost grows steeply with length — twice the text, about four times the computation[4] — so send the model only what it needs. And it predicts likely text, not checked facts, so it can be fluent and wrong. Use it for drafts and summaries with a human in the loop; don’t hand it final authority over legal, medical, or financial calls.
Bottom line
You rent this capability, you pay more as inputs grow, and you treat its confident output as a smart draft to verify.
References
- Attention Is All You Need — Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. arXiv (Google Brain / Google Research) arxiv.org
- Attention in transformers, step-by-step (Deep Learning, chapter 6) — Grant Sanderson. 3Blue1Brown www.3blue1brown.com
- What is a Transformer Model? IBM www.ibm.com
- Transformers and Attention: How LLMs Actually Process Text — Q. V. Fagundes. DEV Community dev.to
- Attention Is All You Need. Wikipedia en.wikipedia.org
Comments
Questions, corrections, and links welcome. Be specific and civil.