Yes. But standard block floating point uses a linear grid scaled by a shared exponent. Whereas AXS-6 uses a NormalFloat grid scaled by a shared exponent to maximize information density for bell-curve distributed weights. Essentially a Block Scaled Normalfloat-5.
That’s just good old block floating point, right?
https://en.wikipedia.org/wiki/Block_floating_point
Yes. But standard block floating point uses a linear grid scaled by a shared exponent. Whereas AXS-6 uses a NormalFloat grid scaled by a shared exponent to maximize information density for bell-curve distributed weights. Essentially a Block Scaled Normalfloat-5.
fp6 with block size 32 is a tough sell today when blackwell has native support for fp4 with block size 16.
How can I contact you?