Long Context Handling

RoPE scaling, sliding window attention, and what breaks at 100K+ token contexts.

Coming soon.