We built our eval studio tool, equipped with a CLI tool for testing AI agents locally.
Most agent workflows I’ve seen don’t have any real evaluation layer. People test manually or rely on prompt tweaks.
I wanted something closer to how we treat backend systems, where you can run tests before shipping.
Eval Studio:
* scans your repo and detects likely agents
* generates eval datasets based on your agent
* runs tests locally against your implementation
* surfaces failures and behavioral gaps
It doesn’t require deploying anything — it runs directly on your local setup.
Get your API key and try it: dutchmanlabs.com
Would really appreciate feedback, especially from people building LLM apps or agent workflows.
We built our eval studio tool, equipped with a CLI tool for testing AI agents locally.
Most agent workflows I’ve seen don’t have any real evaluation layer. People test manually or rely on prompt tweaks.
I wanted something closer to how we treat backend systems, where you can run tests before shipping.
Eval Studio:
* scans your repo and detects likely agents * generates eval datasets based on your agent * runs tests locally against your implementation * surfaces failures and behavioral gaps
It doesn’t require deploying anything — it runs directly on your local setup.
Get your API key and try it: dutchmanlabs.com
Would really appreciate feedback, especially from people building LLM apps or agent workflows.