Auto-harness: Self improving agentic systems with auto-evals (open-sourced)

(twitter.com)

3 points | by gauri1902 13 hours ago ago

2 comments

gauri1902 13 hours ago
Hey all, we just released our work on self-improving AI systems at NeoSigma. We show our auto agent harness improvement system on Tau3 benchmark tasks where the agent’s score improves from 0.56 to 0.78 (~40% jump) while mining failures and auto maintaining live evals. We got a lot of responses from people wanting to try the self-improving loop on their own agent, so we open-sourced our setup. Releasing auto-harness: an open source library for our self improving agentic systems with auto-evals. Connect your agent and let it cook over the weekend. Watch it go brrrr!! Link to the article here: https://x.com/gauri__gupta/status/2040251170099524025
deadinator 13 hours ago
Point it at your agent. Leave it running. Come back to a better agent with evals!!