Secret Hitler LLM Benchmark

(github.com)

4 points | by jordan_gibbs 10 hours ago ago

1 comments

jordan_gibbs 10 hours ago
Built this for fun; uses OpenRouter for easy access to a variety of models.
The joy of seeing these AIs flail and argue is unmatched and reassuring; they're unbelievably bad at the game. However, it is an interesting view into their ability to deceive.
This costs a ton to run, and I don't currently have the funds to properly benchmark across frontier models, so I'd love it if we could source some data from the community!