Inference Economics

GPU cost per token, batching efficiency, latency-throughput tradeoffs, and margin at scale.