I ported a research KV-cache quantizer KVarN to Apple Silicon from my iPhone in Japan, using a $30 Chinese LLM
I just shipped mlx-kvarn: the first MLX-native implementation of KVarN, a KV-cache quantization method from a 2026 Huawei paper. It gives you up to ~4.7× more KV-cache capacity on a...
Continue reading
0 Comments