DSpark: Speculative decoding accelerates LLM inference [pdf]

(github.com)

646 points | by aurenvale 8 hours ago ago

238 comments