Building a Transformer

From tokenization to attention to feedforward layers — how the core architecture actually works.