SAW-INT4: System-Aware 4-Bit KV-Cache Quantization for Real-World LLM Serving

(arxiv.org)

2 points | by matt_d 11 hours ago ago

No comments yet.