Built this for fun; uses OpenRouter for easy access to a variety of models.
The joy of seeing these AIs flail and argue is unmatched and reassuring; they're unbelievably bad at the game. However, it is an interesting view into their ability to deceive.
This costs a ton to run, and I don't currently have the funds to properly benchmark across frontier models, so I'd love it if we could source some data from the community!
Built this for fun; uses OpenRouter for easy access to a variety of models.
The joy of seeing these AIs flail and argue is unmatched and reassuring; they're unbelievably bad at the game. However, it is an interesting view into their ability to deceive.
This costs a ton to run, and I don't currently have the funds to properly benchmark across frontier models, so I'd love it if we could source some data from the community!