vLLM introduces memory optimizations for long-context inference

(github.com)

5 points | by addisud 18 hours ago ago

1 comments