Instant AI Response

(chatjimmy.ai)

28 points | by hochmartinez 9 hours ago ago

11 comments

pella 4 hours ago
- https://news.ycombinator.com/item?id=47086181
- https://taalas.com/the-path-to-ubiquitous-ai/
- https://www.nextplatform.com/2026/02/19/taalas-etches-ai-mod...
4 hours ago
[deleted]
personalcompute 4 hours ago
This is a demo of Taalas inference ASIC hardware. Prior discussion @ https://news.ycombinator.com/item?id=47086181
alansaber 4 hours ago
I love seeing optimised SLM inference. Is there a current use-case for this? Edge CNNs make sense to me but not edge SLMs (yet).
nacs 4 hours ago
What model and hardware powers this?
Is this a Google T5 based model?
[-]
- pella 4 hours ago
  3bit hard-wired Llama 3.1 8B ( https://taalas.com/the-path-to-ubiquitous-ai/ )
Kuyawa 4 hours ago
If this is possible, why not all online AI engines work like this?
[-]
- yomismoaqui 3 hours ago
  This is an specific model (Llama 3.1 8B) baked in hardware form. You can only use this model but get "low" power consumption and crazy speed.
  If you want to run a different model you need new hardware for that new model.
  [-]
  - sixtyj 3 hours ago
    It is really a crazy speed. 15k tokens/second.
OutOfHere 2 hours ago
Impressive, but this particular underlying LLM is objectively weak. I'd like to see it done with a larger and newer better model.
notronic 4 hours ago
imagine a model like opus 4.6 at that speed, that would be insane