Matrix Operations on GPU

GEMM, tiling, shared memory, and why every transformer layer is fundamentally a matrix multiply.

Coming soon.