Built a small tool to compare LLMs side-by-side across benchmarks, arena scores, pricing, and modalities. Select up to 10 models and everything lines up in aligned columns.
No backend — all data is pre-fetched from api.zeroeval.com weekly via GitHub Actions and served as static JSON from GitHub Pages. Fully shareable URLs via ?m= param.
Data is also accessible directly as JSON for anyone who wants it programmatically (see llms.txt). Stack is vanilla HTML/CSS/JS, no framework.
Built a small tool to compare LLMs side-by-side across benchmarks, arena scores, pricing, and modalities. Select up to 10 models and everything lines up in aligned columns.
No backend — all data is pre-fetched from api.zeroeval.com weekly via GitHub Actions and served as static JSON from GitHub Pages. Fully shareable URLs via ?m= param.
Data is also accessible directly as JSON for anyone who wants it programmatically (see llms.txt). Stack is vanilla HTML/CSS/JS, no framework.
https://github.com/broskees/llm-compare
I hate that every comparison tool is only 2 models at a time, so this is nice. GJ.
I'm finding some surprising performance on benchmarks from open weight models when compared to top models https://broskees.github.io/llm-compare/?m=claude-opus-4-6%2C...