MathNet:30k competition math problems for AI mathematical reasoning benchmarking

(mathnet.mit.edu)

5 points | by nill0 7 hours ago ago

2 comments

LeCompteSftware 12 minutes ago
Hmm I already found a typo in one of the solutions. I believe this scraped from a bunch of PDFs in an unaudited automated process, so of course there are going to be some problems. But
a) It doesn't bode well that I poked at three problems and already found an issue.
b) Even if it took 50 problems before my sampling paid off, there are 30,000 things to review here. I am not sure anyone actually took responsibility for even reading it, let alone making sure it was correct.
I am getting tired doing basic sanity-checking on this stuff. Maybe I just got extremely unlucky and found one of the 300 problems with a typo. But I have been feeling awfully dejected at seeing so much garbage vibe code this year, and am not feeling particularly charitable to this. If volunteer QA can find a problem with 5 minutes of not particularly close reading, then it doesn't seem like this is ready for release.
nill0 7 hours ago
Relevant article:
https://news.mit.edu/2026/mit-scientists-build-worlds-larges...