Back to the wiki
Tokens
The pieces AI breaks all text into.
The analogy
To an AI, text is like a Lego build: it doesn't see whole words, but smaller pieces called tokens. “Hello” might be a single piece; “extraordinary” might be three or four. The AI reads and writes piece by piece.
In detail
A token is the smallest unit the model processes: a word, a chunk of a word, or a punctuation mark. Tokenizers (like BPE) split text into these units. They matter because context limits and API pricing are measured in tokens, not words. In English, a word averages roughly 1.3 tokens.
An example
The sentence “Artificial intelligence is fascinating” might be split like this: “Artificial”, “ intelligence”, “ is”, “ fascin”, “ating”. Five tokens for four words.