6 points | by jflynt76 10 hours ago ago
2 comments
> GroundEval is built around that question...
> This is the same distinction GroundEval makes for question answering agents.
> GroundEval treats agent behavior as something that can be tested against a state contract.
> That is the class of failure GroundEval is designed to catch.
this is an ad shaped like a blog post
For what it's worth, I didn't describe it as anything; just posted the link. It's a paper with open code, no product behind it.
> GroundEval is built around that question...
> This is the same distinction GroundEval makes for question answering agents.
> GroundEval treats agent behavior as something that can be tested against a state contract.
> That is the class of failure GroundEval is designed to catch.
this is an ad shaped like a blog post
For what it's worth, I didn't describe it as anything; just posted the link. It's a paper with open code, no product behind it.