Attention & Memory

FlashAttention, IO-aware algorithms, and why materializing attention matrices kills performance.

Coming soon.