Agent-evals: Metacognitive scoring and boundary testing for LLM coding agents

(thinkwright.ai)

2 points | by oceanwaves 12 hours ago ago

1 comments