Mixed Precision Quantization on mlx comes with TurboQuant implementation

(twitter.com)

2 points | by jsilence 13 hours ago ago

1 comments

jsilence 13 hours ago
User @thin_signal developed a tool for mixed precision quantization on MLX. They perfoemd a sensitivity analysis across the model layers and applied less radical quantization on the layers more sensitive and more quantization tomlayers that are more robust.
The tool, which is documented here (https://mlx-optiq.pages.dev/) also implements the recently aanounced TurboQuant KV-Cache optimization, so in total this should greatly improve the quality of locally run LLMs.
Looking forward to an OptiQ release of the Gemma 4 family.