Definition
A technique that lets an AI weigh which words in the text matter most to each other, so it can track context even across far-apart words.
At a glance
- For each word, the model scores how relevant every other word is, then leans on the ones that matter[2].
- It links related words no matter how far apart they sit[1] — something older AI struggled with.
- Introduced in Google’s 2017 paper Attention Is All You Need, it created the Transformer architecture.
- It is the engine behind tools like ChatGPT, which weight each word to decide what to use[4].
How it works
Read “the company that the bank approved finally launched” and you connect “launched” to “company,” not “bank.” Attention gives AI that same skill: it views all words at once and directly ties related ones together[3], instead of reading one word at a time and forgetting earlier context.
Why it matters
It is why today’s tools can summarize a long document, draft an email in the right tone, or hold a coherent conversation. They work by weighing relevance, not true understanding — which explains both their strengths and their slips when context is unclear.
Bottom line
Attention lets a model decide which words matter most to each other, turning AI from a forgetful word-by-word reader into one that grasps context across whole documents.
References
- Attention Is All You Need — Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Lukasz Kaiser, Illia Polosukhin. arXiv (Google Brain) arxiv.org
- What is an attention mechanism? IBM www.ibm.com
- Understanding attention in large language models. University of Michigan news.engin.umich.edu
- The Power of Paying Attention, How ChatGPT Understands Conversations — Sina Nazeri. Medium medium.com
Comments
Questions, corrections, and links welcome. Be specific and civil.