Efficient Inference

LoRA at inference time, speculative decoding, and draft model verification strategies.

Coming soon.