SWE-bench Verified no longer measures frontier coding capabilities

(openai.com)

320 points | by kmdupree a day ago ago

171 comments