Definition

A context window is the maximum amount of text, measured in tokens, that an AI model can hold in view at one time, covering both what you send and what it writes back.^[1]

At a glance

It is the AI’s short-term working memory, not stored knowledge. Once a request ends, it remembers nothing.
Measured in tokens: 1,000 tokens is roughly 750 words. Your prompt, attached files, chat history, and the reply all share one budget.
Sizes range from 128K tokens to 1 million or more — but bigger is not automatically better.
You pay per token, both input and output, so the smallest context that does the job is usually the cheapest correct one.

How it works

The window is the AI’s desk, not its filing cabinet: it can only reason about what is on it right now. When the desk fills, the oldest material slides off and is gone^[5]. The model’s reply comes out of the same budget, so a huge input leaves little room for a long answer^[4].

Why bigger is not always better

2026 models offer 200K to 1 million tokens, enough to drop in a whole contract or codebase^[3]. But reliability suffers: models use the start and end of a long window well and lose track of facts buried in the middle^[2]. The advertised size is optimistic too — a model rated for 200K often gets shaky closer to 130K^[3].

Bottom line

Don’t chase the biggest window; feed the model the smallest, most relevant slice that answers the question.

What is a context window?

At a glance

How it works

Why bigger is not always better

Bottom line

References