28 points | by hochmartinez 9 hours ago ago
11 comments
- https://news.ycombinator.com/item?id=47086181
- https://taalas.com/the-path-to-ubiquitous-ai/
- https://www.nextplatform.com/2026/02/19/taalas-etches-ai-mod...
This is a demo of Taalas inference ASIC hardware. Prior discussion @ https://news.ycombinator.com/item?id=47086181
I love seeing optimised SLM inference. Is there a current use-case for this? Edge CNNs make sense to me but not edge SLMs (yet).
What model and hardware powers this?
Is this a Google T5 based model?
3bit hard-wired Llama 3.1 8B ( https://taalas.com/the-path-to-ubiquitous-ai/ )
If this is possible, why not all online AI engines work like this?
This is an specific model (Llama 3.1 8B) baked in hardware form. You can only use this model but get "low" power consumption and crazy speed.
If you want to run a different model you need new hardware for that new model.
It is really a crazy speed. 15k tokens/second.
Impressive, but this particular underlying LLM is objectively weak. I'd like to see it done with a larger and newer better model.
imagine a model like opus 4.6 at that speed, that would be insane
- https://news.ycombinator.com/item?id=47086181
- https://taalas.com/the-path-to-ubiquitous-ai/
- https://www.nextplatform.com/2026/02/19/taalas-etches-ai-mod...
This is a demo of Taalas inference ASIC hardware. Prior discussion @ https://news.ycombinator.com/item?id=47086181
I love seeing optimised SLM inference. Is there a current use-case for this? Edge CNNs make sense to me but not edge SLMs (yet).
What model and hardware powers this?
Is this a Google T5 based model?
3bit hard-wired Llama 3.1 8B ( https://taalas.com/the-path-to-ubiquitous-ai/ )
If this is possible, why not all online AI engines work like this?
This is an specific model (Llama 3.1 8B) baked in hardware form. You can only use this model but get "low" power consumption and crazy speed.
If you want to run a different model you need new hardware for that new model.
It is really a crazy speed. 15k tokens/second.
Impressive, but this particular underlying LLM is objectively weak. I'd like to see it done with a larger and newer better model.
imagine a model like opus 4.6 at that speed, that would be insane