> No need to process anything in parallel, the machine can run the token through 100 layers faster than the user can type.
Yeah if your usecase is chat sure, it can run faster than you can type. For anything useful, like code autocomplete, or agentic coding, the context is always in the hundreds of thousands of tokens. And usually the new prompt is going to be 50 to a few thousand tokens (if you are including error tracebacks for example). And the user will typically expect hundreds of thousands of tokens to be generated. If you don't believe me use cerebras with claude code and watch it generate millions of tokens in a few minutes.
> No need to process anything in parallel, the machine can run the token through 100 layers faster than the user can type.
Yeah if your usecase is chat sure, it can run faster than you can type. For anything useful, like code autocomplete, or agentic coding, the context is always in the hundreds of thousands of tokens. And usually the new prompt is going to be 50 to a few thousand tokens (if you are including error tracebacks for example). And the user will typically expect hundreds of thousands of tokens to be generated. If you don't believe me use cerebras with claude code and watch it generate millions of tokens in a few minutes.
ofc not