Transformer
The architecture that made everything else possible.
The analogy
At a party, you don't listen to everyone equally: you pay attention to whoever says something relevant to you. The transformer gave machines that ability: “attention” lets every word look at all the others and decide which ones matter for its meaning.
In detail
A neural network architecture introduced in 2017 in the paper “Attention Is All You Need”. Its self-attention mechanism processes all positions of a sequence in parallel and captures long-range relationships, making it far more trainable at scale than earlier recurrent networks. GPT, Claude, Gemini and Llama are all transformers.
An example
In “The cat we saw yesterday at the park was asleep”, to understand “was asleep” the model needs to connect it to “the cat”, eight words away. Attention does exactly that.