Wouldn't LLVM adjust the models if it is beneficial to its code generation, even if the result less accurately reflects the processor? (I think GCC does that.)
This is interesting if quite incomplete (as noted in the end conclusion). CPU re-order buffers turn what you think as mostly sequential execution into a massively parallel engine. Data memory access, perfecting, speculative execution, etc. But if you are running a micro-bencmark with a tight loop of millions of iterations, then understanding the pipeline dependencies and dispatching can provide good insights.
Too bad they don't support LC-3 or DLX. More my level lol. So begins another deep dive side quest with the chatbot into a tool I didn't even know existed.
Also nicely demonstrated in godbolt's currently ongoing Advent of compiler optimizations series.
Wouldn't LLVM adjust the models if it is beneficial to its code generation, even if the result less accurately reflects the processor? (I think GCC does that.)
This is interesting if quite incomplete (as noted in the end conclusion). CPU re-order buffers turn what you think as mostly sequential execution into a massively parallel engine. Data memory access, perfecting, speculative execution, etc. But if you are running a micro-bencmark with a tight loop of millions of iterations, then understanding the pipeline dependencies and dispatching can provide good insights.
Yep. Cache is always the wildcard.
Too bad they don't support LC-3 or DLX. More my level lol. So begins another deep dive side quest with the chatbot into a tool I didn't even know existed.