Building a TransformerFrom tokenization to attention to feedforward layers — how the core architecture actually works.