They say in the Lean Zulip thread that this is actually intentionally a "low quality" release (https://leanprover.zulipchat.com/#narrow/channel/583336-Auto...); the paper notes that the quality is "inferior to that of expert-written Lean code". Then again, "Our results suggest that formalizing the core textbook infrastructure of modern mathematics is within reach".
In summary, we present a system which actually doesn’t work because real experts looked through the thing our system produced and found that it was full of reward hacking, but of course we’re honest about this and we’ll adjust our metrics, and so we think it’s okay to just slop this out there because we’re honest and we will make promises to fix everything, and we also hope you like our attempt to gain notoriety for Meta AI.
> However, our system relies on frontier LLMs, whose compute costs still remain nontrivial. Moreover, each textbook was formalized in isolation, without the careful planning needed to make it maximally compatible with existing mathlib infrastructure. Human involvement currently remains necessary to handle such organisational concerns — selecting and ordering books in a dependency-aware manner, bridging mismatched conventions across sources, etc
I take a very dim view of slopping out 500kloc and then giving it to unpaid experts to perform the actual work of checking it (confirmed at https://leanprover.zulipchat.com/#narrow/channel/583336-Auto... that this is what they did), especially given the reported incorrectness of the code (https://leanprover.zulipchat.com/#narrow/channel/583336-Auto... or https://leanprover.zulipchat.com/#narrow/channel/583336-Auto... for example).
They say in the Lean Zulip thread that this is actually intentionally a "low quality" release (https://leanprover.zulipchat.com/#narrow/channel/583336-Auto...); the paper notes that the quality is "inferior to that of expert-written Lean code". Then again, "Our results suggest that formalizing the core textbook infrastructure of modern mathematics is within reach".
In summary, we present a system which actually doesn’t work because real experts looked through the thing our system produced and found that it was full of reward hacking, but of course we’re honest about this and we’ll adjust our metrics, and so we think it’s okay to just slop this out there because we’re honest and we will make promises to fix everything, and we also hope you like our attempt to gain notoriety for Meta AI.
> However, our system relies on frontier LLMs, whose compute costs still remain nontrivial. Moreover, each textbook was formalized in isolation, without the careful planning needed to make it maximally compatible with existing mathlib infrastructure. Human involvement currently remains necessary to handle such organisational concerns — selecting and ordering books in a dependency-aware manner, bridging mismatched conventions across sources, etc