User @thin_signal developed a tool for mixed precision quantization on MLX. They perfoemd a sensitivity analysis across the model layers and applied less radical quantization on the layers more sensitive and more quantization tomlayers that are more robust.
The tool, which is documented here (https://mlx-optiq.pages.dev/) also implements the recently aanounced TurboQuant KV-Cache optimization, so in total this should greatly improve the quality of locally run LLMs.
Looking forward to an OptiQ release of the Gemma 4 family.
User @thin_signal developed a tool for mixed precision quantization on MLX. They perfoemd a sensitivity analysis across the model layers and applied less radical quantization on the layers more sensitive and more quantization tomlayers that are more robust.
The tool, which is documented here (https://mlx-optiq.pages.dev/) also implements the recently aanounced TurboQuant KV-Cache optimization, so in total this should greatly improve the quality of locally run LLMs.
Looking forward to an OptiQ release of the Gemma 4 family.