SlopCodeBench: Benchmarking How Coding Agents Degrade over Long-Horizon Tasks

(scbench.ai)

2 points | by matt_d 13 hours ago ago

No comments yet.