> de Grivel was more forthcoming about the code's provenance:
>
> "No Linux source files were ever read to build this driver. It's pure AI (ChatGPT and Claude-code) and careful code reviews and error checking and building kernel and rebooting/testing from my part."
I think he was confused and what he meant to write is this: "To build this driver, more Linux source files have been read than any human except Linus Torvalds and a select few have ever read".
I'm not discussing whether or not license do still apply after code has been used to train a LLM and then the LLM spouts back another implementation (although it's been shown if I'm not mistaken a LLM reproduced Harry Potter verbatim and I take it that if it's word-for-word, license to apply after all)...
But saying "No Linux source file were ever read" when LLMs have been trained with every Linux source file ever, going back to when Linux wasn't even using Git yet, is quite something.
That's literally how LLMs are trained: by being fed a shitload of data.
P.S: I'm no hater btw: I pay Anthropic and use Claude Code daily (today we're writing some elisp code that calls tree-sitter).
The discussion, however, is he's claiming what amounts to a "clean room" implementation which is blatantly not true because as you say, the LLM was trained on the sources, so the room itself just isn't clean.
Previous discussion: https://news.ycombinator.com/item?id=47546732
here's the mailing list thread in case anyone is interested:
https://marc.info/?t=177377722400001&r=1&w=2
> de Grivel was more forthcoming about the code's provenance: > > "No Linux source files were ever read to build this driver. It's pure AI (ChatGPT and Claude-code) and careful code reviews and error checking and building kernel and rebooting/testing from my part."
I think he was confused and what he meant to write is this: "To build this driver, more Linux source files have been read than any human except Linus Torvalds and a select few have ever read".
I'm not discussing whether or not license do still apply after code has been used to train a LLM and then the LLM spouts back another implementation (although it's been shown if I'm not mistaken a LLM reproduced Harry Potter verbatim and I take it that if it's word-for-word, license to apply after all)...
But saying "No Linux source file were ever read" when LLMs have been trained with every Linux source file ever, going back to when Linux wasn't even using Git yet, is quite something.
That's literally how LLMs are trained: by being fed a shitload of data.
P.S: I'm no hater btw: I pay Anthropic and use Claude Code daily (today we're writing some elisp code that calls tree-sitter).
"Everything is a derivative work", as the saying goes.
He's of course, stating (both):
1. He didn't read the source files
2. He didn't feed the source files to the LLM.
He's not claiming the LLM didn't train on these.
The discussion, however, is he's claiming what amounts to a "clean room" implementation which is blatantly not true because as you say, the LLM was trained on the sources, so the room itself just isn't clean.