3 points | by g023 13 hours ago ago
2 comments
Starred immediately.
This is exactly the kind of practical quantization work that makes running longer-context models on consumer GPUs actually feasible. Looking forward to seeing it generalized beyond the one model.Great stuff, g023.
What does it actually do?
Starred immediately.
This is exactly the kind of practical quantization work that makes running longer-context models on consumer GPUs actually feasible. Looking forward to seeing it generalized beyond the one model.Great stuff, g023.
What does it actually do?