Why SWE-bench Verified no longer measures frontier coding capabilities

(openai.com)

2 points | by gmays 10 hours ago ago

1 comments