Back to the wiki

Context window

The model's working memory: if it doesn't fit, it doesn't exist.

The analogy

Picture a study desk: it holds your notes, a book, and not much else. If you want to add a giant atlas, something has to come off the desk. The context window is that desk: the model can only “see” what fits on it at any given moment.

In detail

It's the maximum number of tokens the model can process at once — your prompt, the previous conversation, and its own answer combined. When a conversation exceeds the limit, older messages get truncated or summarized and the model “forgets” them. Today's models handle anywhere from thousands to millions of tokens.

An example

You paste a 300-page contract and ask about clause 2. If the document exceeds the window, the model may never have “read” that part — better to split it up or use techniques like RAG.

Related concepts