> In looking at the code that the LLMs have produced for the project, especially given the pretty massive and widespread architectural changes needed to make the implementation libified and memory safe, we decided that the codebase is not a derivative work that would require carrying forward the GPL license and have decided to release the code under the MIT instead.
I don't think it's that clear cut. The functional parts probably aren't copyrightable, only the stylistic ones. It's going to be a mix of courts applying laws in new ways that hasn't been done before and fact specific questions about what actually persisted through the LLM if it goes to court.
I'd be fascinated to see what happens if it does. Both in the analyses that we'd get of what the LLM did to the codebase and on the legal decisions on what the copyrightable creative elements in code actually are.
If I was the author though... there would be no way that I would be volunteering to be a test case like this. Also seems just rude for no reason.
It probably would have been less bad if he had chosen MPL-2.0 or LGPL-2.1-or-later. But he chose MIT, which cuts at the core of the intent of licensing the project with a share-alike license.
Tell me, can I create a copyrighted video that's not GPL licensed using ffmpeg?
Now tell me how creating a rust library using the git test suite is different?
But for the sake of argument: The test suite itself is copyrighted. To the extent the resulting work is a derivative of the test suite it is possibly infringing. For example you might example that the agent would derive variable names, function names, structure sequence and organization of the code from the test suite. It might even copy comments wholesale. Those are copyrightable things. (Which is of course just the first step in analyzing if it is infringement, there would be interesting fair use, de-minimis copying, etc arguments following a conclusion that any of those were copyrighted. A product produced this way definitely could be infringing given the right facts though).
yeah fair - the "The canonical Git source code we're targeting to replicate the functionality of is in the git/ subdirectory." part makes this hard to argue against.
> To the extent the resulting work is a derivative of the test suite it is possibly infringing
It's this bit that I have a problem with. If I run the test, it fails and reports a failure. Now I write code and run the test again. What is the theory there that code that I wrote infringes.
If you did it in a loop until the test passed, maybe?
Your result is essentially impossible without the original. With ffmpeg, your result does not depend on ffmpeg specifically - you can use any video creation tool.
Doesn't infringe upon copyright period, because there's no creative element in that work.
Imagine a more substantial example though. Perhaps you have a test that checks that some file written in a binary format is correct, and gives names (creative elements) to each field of the format that it prints when you mess up the field, and has comments describing why the bytes are laid out like they are (the comments being copyrightable even if the facts they describe aren't), and the LLM copies those field names and comments verbatim... Now it's quite likely that the LLMs work is a derivative of the test suite.
For that assertion in particular I believe I'm practically parroting a ruling by the district court in Oracle vs Google about some extremely simple Java functions that Oracle claimed Google copied. Though I can't say I checked to make sure I'm remembering right.
You're recalling it right, but there's a nice quote from Judge Alsup in that case that talks about this exact situation:
> “So long as the specific code used to implement a method is different, anyone is free under the Copyright Act to write his or her own code to carry out exactly the same function or specification...”
Here given that this is rust and the original expression is C, the implementations cannot be the same by definition.
A GPL tool that processes data doesn't virally transfer the license to its output. Copyrighted ffmpeg code isn't incorporated into the video output. The LLM didn't just conjure up equivalent behavior to git without ingesting the code and transforming it as new output. There is no other behavioral description that would reproduce all needed functionality.
Substitutibility probably doesn't apply here in the way you're implying and if it did it would likely be hampered by the 9th circuits findings about transformation in sony v connectix. Arguments here likely would look at rust not having a stable ABI, and hence not being inherently substitutable as a libray (grit-lib), less clear as an executable (grit-cli) on that side
basics of copyright law - the fundamental thing being protected is the expression... is a rust program's expression the same expression as a c program? I'd say generally not.
The test suite could test aspects of the architecture/design of the codebase that are not necessary for interoperability and constitute novel expression of a piece of software in a way that is not at all language specific.
By definition a test suite is about testing interoperability with the test suite. An HTTP test suite should likely test for whether response code 418 is implemented a particular way and while humorous it would still be an interop test no?
functional parts not being copyrightable means that you can't claim a program is a copyright violation based on the fact it does the exact same thing based on compatibility reasons (you can copy what the program does). E.g. git stores refs in .git/refs, so does grit, that's not a violation. You still can't copy the program.
Yes... and now we get to the fact specific question of "did they copy the program". Or actually the answer to that is plainly "no" - they made something similar from it - and didn't run ctrl-c ctrl-v in an unlicensed manner, but "did they copy the relevant facets of the program into the new similar thing".
No. You're allowed to make a similar tool, the functional elements are not copyrightable. There's a long history, predating LLMs by many decades, of doing this in the software industry.
My use of the word "similar" does not imply here that I think it's obvious that they are "similar" in any copyrightable elements - whether they are or not is one of the interesting questions I think this case would have to resolve.
Incidentally you're also allowed to make similar creative elements so long as they aren't copies and you did so independently... which could actually come up in a case like this (imagine the LLM produced a similar function to some function in the original... but the original wasn't in the context window at the time. Not at all unlikely with code where there often is only one or two natural ways to write something).
Because compilers and LLMs do different things, and what is done matters, so you can't reason by stepping from one to the other.
Compilers don't axiomatically yield derivative works, they simply in practice do because for non-trivial programs they preserve copyrightable elements of the work in the output.
Well compilers are a mechanical transformation and if that were sufficient to free you of IP law then IP law wouldn't work.
An LLM is also a computer program which takes input and produces output related in some way to that input. However I don't think most people would view it as a "mere" mechanical transformation. One could tautologically argue that an LLM blends the user input with the training inputs which is a sort of transformation and further that the LLM itself is a computer program thus it is mechanical in nature. However it should be immediately obvious that such an overly literal interpretation is in danger of subsuming human work as well. Where the boundary lies is an unanswered question.
Related, compilers can pose a problem depending on what the output includes. For example common lisp compilers that aren't under a permissive license are a minefield because regardless of what anyone might say the image that gets output includes (approximately) the full language implementation verbatim in addition to the user's program.
I suspect that the issue is more likely that the LLM code doesn't have an author and hence some parts of it can't be licenses, it's less likely that it's infringing on git's copyright for various reasons. (I am not a lawyer, but I do read copyright law for funsies).
> It concludes that the outputs of generative AI can be protected by copyright only where a human author has determined sufficient expressive elements. This can include situations where a human-authored work is perceptible in an AI output, or a human makes creative arrangements or modifications of the output, but not the mere provision of prompts.
What makes you think that's what the article says that it did? There's a lot of specific nuance and it doesn't say that anywhere. In fact it speaks of making a test suite pass only. This is the classic cleanroom bios from specs approach but no need to extract it as the test is available to run and there's nothing in the GPL that suggests that running a test suite infects software that you run it on.
I'm not a copyright lawyer, but it seems pretty clear to me you can't wash a license using an LLM.
[US jurisdiction]:
Anything in the result written by the LLM can not be copyright by anyone.
Anything in the result written by a human can be, and if it was all emitted by the LLM then that portion originally written by a human carries its own copyright.
As a work of an LLM, the entirety presumably can not be copyright, at all. Portions written by humans presumably carry their original copyright.
> [US jurisdiction]: Anything in the result written by the LLM can not be copyright by anyone.
This is a bit stronger than the actual report where this has been discussed finds. See part 2 in https://www.copyright.gov/ai/ for details, but TL;DR, parts where humans have control over the expression may be copyrightable. But working out which parts those are is likely a difficult question (would likely require proof of provenance across many of those LLM sessions)
Not a fan of this trend of "cleaning" GPL licensed software and releasing under permissive licenses. Also why I'm not a fan of UUtils nor Canonical's early adoption of it in Ubuntu.
The intent here is extraction of all the value provided by copyleft projects without the obligation to give back. Wether it's technically legal or not, it's disgusting behavior IMO.
Knowing what you don't know is such an important skill in life and your career. And I 100% agree with you that the author is, well, off their rocker.
Let me give an example: I could take Goldeneye from the N64, extract the binary and then run it through an LLM to disassemble it and possibly rewrite it in a modern higher-level language. Do you think Nintendo would look at that and say "well, he did a lot of work so he's escaped our license"? Of course not. It's just silly.
ingesting the source code and producing output in another language is quite clearly a derivative work. You don't need to be an IP lawyer to figure that out.
Now, if you went to Calude and gave it documentation and told it to produce something that was compatible, would that be a derivative work and thus covered by the GPL? I would guess probably. But I'm not 100% sure anymore. I wouldn't risk it however.
Here's another thought experiment: what if someone takes this supposedly MIT licensed source tree, plugs it into another LLM and asks it to produce the output in C? Now how is it licensed? It might be very similar. After all, there are only so many ways to produce a SHA1 hash and so many ways to do a command line parser.
But this then makes it an interesting legal issue. In the Oracle v. Google court case, this was a key issue. Google successfully argued there's only so many ways to write a loop so just because a loop is similar to the source, that doesn't mean it's copyright infringement (as Oracle argued).
Well that is already how it is done with numerous multi-decade open rewrites of closed games. They usually require the asset pack.
I don't know how this squares with law, but Oracle v Google gave a very valuable judgment to the public that an API is not copywritable. If we take the LLM out of it, that's all we are talking about in the pure case.
Of course, we can't take the LLM out, but it is the starting point.
This is simply plagiarism of GPL-licensed code, and license-washing as well.
I can understand working backwards from a test suite, but this literally just reads the original source:
I’m all for memory safety and such but honestly what’s the use case for this? Showing off agentic development? In 10+ years git has never failed on a memory overflow or else. Sometimes software is “good as is” and I’m pretty confident git classifies as such. I’ve also never really hit the limitations of git, even with teams of 20+ developers and lots of binary artefacts. You got to really stretch git limitations, in which case you might need to move away from git, and a rust rewrite will not help in any way whatsoever. So again … why?
I addressed this in the post, but Git has no linkable library and never has. If you want to do even something small, you need to fork/exec a process and communicate with it via stdin/out. Or completely reimplement it and all of the edge cases - for example, reading even one object can be either loose (easy) or in a packfile (much more difficult). Reading a reference (what SHA does a branch point to) can be in a loose file, a packfile, or a reftable. etc.
There is no way anyone would ever use this for it's CLI - it will almost certainly always be slower and worse in every way, even if I get it stable (which it's currently not). You can use libgit2 (a project I also helped kickstart), or Gitoxide (a project GitButler also currently helps drive) - they are faster and better in nearly every way, but they are not feature complete.
This isn't for the person using Git. This is for someone trying to build a tool that wants to use parts of Git, which is different.
But libgit2 exists, right? It may not have 100% feature parity with git, but that's a linkable library that gives you a lot of functionality when working with git repos.
the author of this post (whom you were responding to) made `libgit`, the library that preceded `libgit2`, and contributed to libgit2 a long time ago as well. Here he is in 2010 writing about libgit2: https://github.blog/news-insights/libgit2-a-git-linkable-lib...
I work on Beagle, a git-compatible SCM [1]. I use ABC, Abstractionless C [2] dialect with slices, optional range checking, etc. So far, memory safety was the least of my concerns, frankly. Most of the thorny issues would be equally thorny in Rust (e.g. right now: reflog zeroed when VM ran out of disk space; must be some state machine issue or an OS level glitch). Also, forking off a C process (no runtime) is cheap enough that you actually want to do that more.
But, those are all technicalities. The key issue I see with the approach: the data structures and algos of git have been fanatically fine tuned for that particular application with those particular usage patterns. By very sophisticated low-level C programmers. So, quite likely, any other app/lib working with that store will always be a suboptimal fit. I would recommend read-only access only, esp for LLM code.
Meanwhile, git's underlying data model (blobs/trees/commits) is very simple and very much internet-standard level. Decoupling at that interface is so much easier with so much less issues looming.
May look differently from your vantage point though.
It might have missing pieces, but it’s easier to vibecode any needed networking additions to Gitoxide (which is maintained) than to just go and burn tokens trying to clone all of git again.
Git wants to add Rust. Gitoxide is a multi year project that’s going to be more maintained than an ad-hoc “it says it passes the test” vibeclone.
I’m not even against vibecloning things when it’s useful, but this shows no benefits. Git is a beloved tool that few people dislike, it’s not like vinext (people disliking the vendor lock-in they have with nextjs).
Also execs should keep in mind that “we burned thousands of dollars on tokens to re-create this beloved software so we can have our own copy”, even without the copyright/licensing argument, just isn’t something positive that the community will react positively to.
It doesn’t feel nice to see your favourite works cloned for no benefit. We’re past the “it was an experiment to see how far AI can go” stage now.
> It's like giving wishes as a genie. You gotta be super explicit with the ground rules.
I have used the genie analogy before. It used to feel more like a Golem but now with the whole Fable sabotage mode https://jonready.com/blog/posts/claude-fable5-is-allowed-to-... it certainly feels more Genie-like.
Previously I described it as "Models give you what you ask, for not what you want". Now with Fable they don't even give you want you want so idk.
Currently some act like it is fine to translate a project and change the license.
Recently Casey Muratori said in a adjacent context that the microsoft AI push may be related to the fact that they have a long standing and elaborate codebase. A large historic software company could have advantages to train models. They could provide extra value with their IP.
Now their IP is potentially in their models and accessible to anyone. If they actually train models on their IP, anyone could implement their APIs and slap a GPL license on it.
A lot of their IP has been leaked over the years anyways. Source code of Windows XP is easily available, and there was the 2022 leak that contained the sources of Bing, Bing Maps, Cortana, etc
I'd be really interested in the opposite, just for the sake of experimentation since that's what these projects mostly are. They all seem to be rewrites for the sake of "performance", because the cost is now lower bc of AI. I'd be interested to see something like a port of Quake III in Python or Kubernetes in Perl, even Rails in Python would be goofy and really fun to see
For Natural Selection 2, it was mainly the gameplay logic that was Lua, all running on their bespoke C++ game engine called Spark. But yeah, modern Python and Lua can be pushed to high performance.
Ah my mistake - I know they ditched Source to pursue their own engine and thought they were going all in on Lua, but doing the whole engine in Lua would have been a struggle I think.
> They all seem to be rewrites for the sake of "performance".
And yet this performs dramatically worse.
A slower, untested, incomplete git implementation, all for the low low price of $10-$15,000.
And don’t forget it wasted a bunch of human time in the process.
So if someone mentioned somewhere else there is already a Rust port a group is doing somewhere. How much could they have accomplished with this much money and time in software development resources?
Ok. AI can seemingly port stuff if you don’t test it thoroughly. I think that’s already been proven. At this point I’m seeing less and less value from these kind of things. I’m sure it was fun for the author, but how does it help other people?
If the first stereotype of Rust programmers is announcing that a project is in Rust before any other desirable software property (e.g. stable, performant, etc), the second stereotype is that Rust programmers love rewriting stuff in Rust, just for the sake of Rust.
(The 2.a. corollary is that they love rewriting GPL projects specifically and downgrading them to MIT/Apache)
But there is already gitoxide, an established git reimplementation in git. It even provides a library
gitoxide was started in 2018, back when we were all writing code by hand, and has some reasonable adoption in the rust ecosystem. It's not feature complete, but if that was the issue then surely fixing that would be better than starting from scratch
Well, it's sort of for Rust. GitButler is written in Rust and Jujutsu is written in Rust and we're both depending on fork/exec'ing to an unknown Git binary with no linkable library and no control over the subprocess to do a range of networking stuff. Neither Gitoxide or libgit2 are capable of this either, as much as I love and support those projects.
This project is entirely about providing a feature complete (even if sloppy) library implementation of Git, which does not otherwise exist.
But... it's memory safe. Not that git has any important memory issue, but now people with skill issues in C can contribute to it without breaking stuff.
This is a problem with people with LLM psychosis who now think they have superpowers, they are completely unaware and just do things naively, they've lost all ability to think for themselves. The LLMs that are thinking for them certainly aren't going to tell them doing X is a bad idea, they're there to produce as many tokens as possible for their owners.
> A pretty fun experiment and I think we can shape this into something truly useful to the whole community.
Agree with first half of this sentence, we should all have fun with experiments.
> It was never based on a linkable and reentrant library, but instead on a "Unix" philosophy of chaining together simpler commands, which means that it's difficult to use it in long running processes without fork/exec overhead for everything.
> It was never based on a linkable and reentrant library, but instead on a "Unix" philosophy of chaining together simpler commands, which means that it's difficult to use it in long running processes without fork/exec overhead for everything.
Added it in full. It still squarely falls under "this is for fun/are you seriously doing this for this purpose" territory for me.
git operate on the filesystem level, the unix behavior is just getting buried. You cannot rewrite git into a linkable library and decide it's now not unix. It's entire behavior is unix, which is why it's awesome.
My intent with this project is not to replace Git in any way. I don't care about the CLI part of this project.
The point is to provide a feature-complete reentrant linkable library. Even if it's an ugly and slow one, this is still the only one thing that exists that covers those points - Gitoxide and libgit2 are both awesome but they are not feature complete.
Git is famously not built around a (reusable) library, hence why we have things like libgit2 (unrelated to git) and why any porcelain on top of git has to resort to calling the binary and parsing its text output.
libgit.a isn't reentrant. It will call `die()` on many errors. If you link to it in a long running binary, it will kill your process on error.
Libgit2 is meant to address this and I was heavily involved in the development of that project 15 years ago. It's great but it's not feature complete and it's development is also completely separate from git development, so it's out of sync and constantly struggling to keep up.
You're asking people to trust you and hand their codebase/IP to your tool while showing them exactly how you treat other people's code/licenses by "deciding" to not carry forward the GPL license.
I have been working on the same problem in other areas. My ultimate goal is to rewrite nginx in Rust passing as much as the upstream tests as possible while leveraging the strongest aspects of Ruts ecosystem - i.e. rustls (modern memory safe OpenSSL), Tokio (async runtime), h2 (http 2 impl) rather than implementing from scratch like the upstream. I started with Lua, then porting over Valkey, and now working on nginx. The reason was because I wanted to learn the ins and outs before taking on the most complex portion.
Happy to answer any questions on the approach! When I started a few weeks ago the harnesses on their own were not good enough to get very far without a "meta harness" of sorts but that is changing largely with Claude Workloads and Mythos. A lot of the work is developing some custom tooling to move these along faster.
Yeah I got one, why? You aren't learning anything, you are just copying code from other codebases and smashing it together to make some nginx-rust thingie... for what actual goal?
Well the biggest goal was to be useful. Nginx serves ~20% of the web, memory unsafe languages might just become untractable for critical exposed to the web infra if the rate of critical CVE's on these rises faster than they can be patched, so a drop in replacement would be a big deal in that world.
But in terms of learning I'm learning relatively little about how to type Rust into an editor but a lot about how to set up agentic loops that can autonomously get tests to pass and improve performance.
For example if you just tell a frontier model (gpt5.5 or Claude Code 4.8) to make some portion of the tests pass they will take forever and just bang their heads against it. I developed a framework to mimic a lot of these tests in nginx... but in minimum non blocking ways so you can run many in parallel with short feedback loops.
Similar for performance - how to make tons of performance benchmark and expose maximum telemetry for agents to go and analyze the hotpaths etc.
You mean you rewrote the nginx test suite with smaller leaner tests ? How did you bootstrap that ? How do you know the leaner tests are equivalent to the real ones ?
Basically I use these "kits" to prove that the behavior is working as expected with mocked data/ interfaces and then only after these kits pass I'll run the real test suite files as confirmation. So these let you iterate a lot faster than the official test suite because it is very slow.
These are bootstrapped from the real tests.
The other commenter was being a bit dismissive but this is the kind of thing I'm taking away as a real useful pattern to do verification of behavior at scale.
> Nginx serves ~20% of the web, memory unsafe languages might just become untractable for critical exposed to the web infra if the rate of critical CVE's on these rises faster than they can be patched
That is true, however did you actually do any research into nginx? Is it particularly prone to memory bugs?
I honestly don't know the answer but you seem to be coming from a place of C bad, therefore nginx super vulnerable?
In my experience with other web servers the vast majority of security bugs are string handling related (path/header injection), which your rewrite will not protect you from.
The project was inspired by that. Also unlike most other projects, nginx is directly exposed to the internet often times which makes it more vulnerable than i.e. Redis/ Valkey or something that would be running within a companies network generally.
"C Bad" is a bit reductionist... but I think there is some truth to the take " Until you have the evidence, don’t bother with hypothetical notions that someone can write 10 million lines of C without ubiquitious memory-unsafety vulnerabilities – it’s just Flat Earth Theory for software engineers" [1]
NSA and other government orgs are also pushing people to stop using C [2] for important software.
I think the risks of a rewrite - especially when using AI - are far more problematic than memory safety. In the long run those C projects will be memory safe in the next five years using memory safe C implementations.
My perspective is that it is good to have a beta in a lot of directions.
No one really knows what the endgame of software security looks like.
So some people should try the port to rust angle, some should focus on hardening the C, some should explore more exotic options like formally provable languages etc
There is nothing wrong with trying different things. But the fundamental problem here is that projects and their communities are social projects and need to be to fulfill their purposes and to ensure long term maintenance. In a free software context, rewrites just like forks (1) are fundamentally an asocial (2) activity because they fragment the community (if successful) and then increase overall maintenance burden if not able to replace the original project completely (rarely the case). Disrespect the license choice of the original authors makes this worse.
1) There may be situation were are fork makes sense (e.g. because one project can not serve different use cases well):
2) Which is why usually a "higher goal" is used to justify this, e.g. authors pretend (or lie to themselves, or may be be stupid enough to actually believe this) that some improvement in memory safety is really that important.
You won't get anywhere pressing your case here. This group has already found you guilty, and no argument will change their minds.
You've been caricatured into a blind AI-follower rust-rewriter-just-because type, and that's the surface they'll continually attack (you're wasting time, hurting the community, v2-itis, bikeshedding, premature optimization, copyright violation, moustache-twirling-evil-intent-rug-pull-later, etc etc etc).
Just continue in your work. It's good, and we need people like you.
There are plenty of Rust based reverse proxies out there, why do you need to specifically rewrite nginx? You could also write a config adapter to Caddy, there are a billion options, but this is a wasted effort. The people who want to stick to their nginx configs won't use your project ever, and the people who actually care about security aren't going to use a vibe coded project.
I have no idea why you are making me spell this out, I thought it was pretty obvious.
This is coming from a cofounder at github, someone who probably knows precisely what the GPL is for. Whatever the legal merits, building on a GPL3 project's complete test suite and relicensing under MIT is not acting in good faith toward the original authors. I really find it disgusting and it makes me want to avoid gitbutler entirely.
I think you're saying that you don't believe in the freedoms to use the GPL licensed test suite for certain purposes which are explicitly allowed by the GPL.
You don't get to choose a license and then add extra terms to it when you don't feel like it's up to scratch. That's something explicitly not allowed by the GPL license.
> Where does the GPL say you have the freedom to relicense code or derivatives under MIT by fiat?
The first part of this sentence (where in the GPL) is unreached if the second part of it is unmet (relicense code or derivatives) which I contend it likely is. You're begging the question.
However:
> The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work
earlier:
> A “covered work” means either the unmodified Program or a work based on the Program.
It's that element that would be difficult to prove "work based on the Program"
Asking an LLM "here's a thing, rewrite it in Rust" is pretty clearly creating either a derivative work or a different form of the same work, just like asking a transpiler would.
There's no evidence that "here's a thing, rewrite it in Rust" is the technique Scott used here.
"here's a test suite, write code in rust that makes that suite pass" is reasonably supported by the article. That would likely not be a derivative work.
Ew. So it tells the LLM where the git source is for the thing they’re duplicating, but I don’t see instructions saying not to read or copy those files or algorithms.
I could have missed them. I didn’t read everything. I did some quick searches.
But the fact they’re not obvious is kind of troubling. Or that they didn’t just copy the tests and documentation for the LLM and not the source to prevent it from looking would hurt any case they had for clean-room privileges in my eyes, ignoring my other comment with concerns about using the tests at all.
If we assume an u licensed or MIT licensed test suite, an LLM could develop from that and documentation and you’d get something you could license MIT.
IMO, IANAL, etc.
And we’ll ignore the question of what the fact the LLM has certainly seen the git code during training means.
But the test suite would have to stay under the original license. And if you use a GPL test suite as they kernel to develop a program from can you license it non-GPL? I’d question that personally. Same acronyms above apply.
This is the exact thing I'm not sure about. See https://news.ycombinator.com/item?id=48470397 where I posit a simpler question: if a `test_sum()` function is copyrighted, does writing a `sum(a, b)` function infringe on the copyright of the software product that `test_sum()` is a part of. I'd say no. There's another part of the GPL that applies here:
> A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an “aggregate” if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate.
So assuming that sum(a, b) is non-infringing and not combined to form a larger program (i.e. the tests aren't compiled into the grit code), then the GPL explicitly doesn't apply to this use
test_sum is assumedly relatively trivial. So as a lay person I’d expect some sort of obviousness test to apply. Like so much of the stuff in the Google/Oracle lawsuit.
But if you take all the individual tests used to test git as a whole, that seems far more unique. Seems like at that point you’re really having to duplicate the actual git internals, and that seems like it should be covered.
> test_sum is assumedly relatively trivial. So as a lay person I’d expect some sort of obviousness test to apply. Like so much of the stuff in the Google/Oracle lawsuit.
Feel free to extrapolate to the threshold where it's not and at that point apply.
> you’re really having to duplicate the actual git internals
Copyright covers the expression, not the method. So the Rust function:
It'd seem weird to plan to use this until the readme stops saying
> it has been nearly entirely written by agents and has not been used for realsies. It's probably currently unusably slow or completely broken in ways that are not exercised in the test suite.
Right now it's someone else's experiment that is still in the "might or might not pan out" stage.
There are a bunch of projects using the similar (not vibe coded, less fully featured) gitoxide project - there is demand for git-as-a-library.
I would not use this except to help us test it if interested. I'm announcing it because it's interesting and a milestone in the breadth of test coverage it can pass. It almost certainly cheated on a bunch of those tests and is not feature complete yet.
The author of gitoxide is also working on GitButler (who worked on this project) and we're pushing both projects forward and actively using and developing Gitoxide as well. This is simply a different and hopefully complimentary approach to the same problem.
I was immediately excited about this wrapped in Python because the current Python git bindings are kind of obtuse, but they do work so I guess I can't complain.
Wordpress is/was successful because it's braindead and has a solid userbase. I am not to flame WP, but it's a quality to target a specific group of consumers.
It's an organic success, hard to replicate. If at all, CF can only make people migrate with massive effort. Marketing effort, selling lots of snake oil in the process. WP wont just hop on the hot new thing, WP is the definition of the opposite. It works for them. Why change.
Git is the same on the other side. It requires maintenance and improvements, surgical and correct. No git maintainer has time to learn a gigantic new codebase and they will stick with what works for them. For git users there are no advantages. So similarly it would require a long time effort to push the project, building trust that it is somehow better, probably requiring Linus to say "it's great".
> The full build of all Git functionality in Rust is currently around 27M, but since a large part of it is a library, it could clearly be easily split up into domains of functionality - subcrates that do specific things.
I downloaded v0.3.99 for Linux x86_64 and stripped the binary. It ends up at 31 MB. The .text section is 25 MB.
I'm surprised by the large size. On my system /usr/bin/git is 4.7 MB, although git is split up into multiple programs. I'm not comparing apples to apples, but this is weird.
If anyone digs into the binary size, please share what you find.
I haven't dug into this at all yet, nor have I tried to optimize the size (or really, anything else).
However, the library part will be less than half of this - a lot of code is spent on the CLI specific stuff and would not be part of the library, which is mostly what I care about for the purposes of this project. The CLI part is just to try to prove the point that it actually does what Git does. The library part is what might be useful in that nothing else exists that does all of the things that it does (provide a reentrant linkable library that is feature complete with Git).
Looking at just the `grit` executable, 58 of the top 200 largest file->resulting code sources are from clap-rs' derive functionality i.e. it's the command-line parsing. The #1 largest is, surprisingly, merge_trees[1] which comes in at 183kiB final binary size. There isn't so much code in that file that it seems reasonable, so it's potentially one of the derives in use (Debug being a common culprit for bloat) that's blowing it out. After those outliers it starts to level out quickly.
Splitting it by crate: `grit` is 13.6MiB, `grit_lib` is 4.8MiB and then it's `std`, `rustls` and `regex_automata` that are the next largest. So as pure library you could hopefully shave off quite a bit of that 25MiB.
What’s the long term strategy for this code base? Does the author expect community code contribution or just bug reports or maybe just test contributions?
I'm happy to take contributions if you want to throw some tokens at it. Bug reports would be amazing, since I haven't tested it for real very much (enough to know you can do basics).
I want to get it to the point where we can replace fork/exec'ing to an unknown Git binary or having said binary be an external dependency for GitButler. The networking stuff (push/fetch) is currently an external dep for both GitButler and Jujutsu (and pretty much every other Git-based tool in the world). I'm pretty sure I can get the project good enough at these networking ops (including all the hairy credential stuff) to be able to not need those fork/exec calls.
The agents did all the work but _somebody_ has to test it for real on their own data to find the edge cases overlooked by AI. That's what users are for nowadays.
Grit was the name of a _Ruby_ implementation of git way back when: https://github.com/mojombo/grit/. I believe it's actually what GitHub was built on then.
I created and named the Grit library that used to power GitHub. Scott Chacon (fellow GitHub cofounder, now CEO of GitButler) specifically asked my permission to re-use Grit as the name of this project, which I gladly granted. R is for Ruby. R is for Rust! Grit is dead. Long live Grit!
I started the project as Gust, but felt like Grit was such a better name. I asked Tom if I could boot the name back up again because I always liked it and he said it was fine.
Also, I worked on the Ruby Grit pretty extensively during the early days of GitHub, so hopefully I earned the right to carry on the mantle. :)
if we conquer the universe, i would love to leave one planet alone for rust users. in this planet, the only allowed programming langauge is RUST! everything should be written in RUST
We're choosing a license that is usable by the entire community. Our goal is a linkable library, which makes GPL impossible. If we had chosen to go with LGPL or GPL with linking exception (like libgit2), it would have the same issue of changing the license, so we went with whatever was the most permissive so everyone could use it for anything if they wish. This has nothing to do with business - I hope I can get the project to the point where Jujutsu or whomever can use whatever is valuable here for whatever they want.
We clearly learned from how Git does operations and emulated it in order to function interoperably, the same way that Gitoxide and libgit2 have, and released it under a license that would be the most valuable for people wanting to use a linkable library, the same way that Gitoxide and libgit2 have.
> Our goal is a linkable library, which makes GPL impossible
Not impossible. It forces the code using the library to be under a GPL-compatible license and requires the binary to be released under the GPL license.
The distinction is quite important. It's only impossible in the mind of someone who wants to release proprietary software. Even for people releasing software under permissive license it's not impossible, just highly inconvenient (and the LGPL is always an option in this case).
>We're choosing a license that is usable by the entire community.
What a weaselly way to put it.
A GPL library, as I'm sure you know, is perfectly usable by anyone including jujutsu and anyone else. They just have to also license under the GPL and this is no barrier to open source projects.
> Currently both Gitoxide and libgit2's networking functionality is either partial, slow or non-existant. Both GitButler and Jujutsu rely on forking out to Git in order to push or pull data. A big reason for this is the incredibly complicated credential logic involved, but all of this is (theoretically) currently covered in Grit.
they still haven't explained why I should bother. Is it faster, easier, more efficient, more capable, more scalable on large codebases, supports better workflows?
In fact, I would rather it stay C for 15 more years.
I'm assuming you didn't read the article, since I'm pretty sure I covered all of this, but I'm happy to respond.
Don't bother.
It's probably not for you. It's slower, more obtuse, more bloated, less capable, exponentially less scalable at any size. Canonical Git is better in every way, except being a linkable library.
Even in the arena of being linkable libraries that can do Git stuff, both Gitoxide (Rust) and libgit2 (C which has git2 crate Rust bindings) are both better, they're just not feature complete. That is the only point of this project.
I hate all you llm people, you're ruining everything, shoving slop down everyone's mouth and telling us you now have superpowers. No you're stealing and making everything as shitty as possible in the process.
I pray everything switches to usage based billing and the curtains can close on this era.
> the result is Grit, a from-scratch, library-based, memory-safe, idiomatic Rust reimplentation of Git that passes over 99% of the entire Git test suite.
Why not 100%?
> It's not actually passing every single test, though that is on purpose. I did mark some parts of the testing suite as "skipped" because I don't think it's worth recreating them in a library like this
> 41,715 / 42,001 tests passing (99.3%)
So it is not entire then but somehow that was worth burning $8,000~ dollars worth of tokens?
> It's not actually passing every single test, though that is on purpose. I did mark some parts of the testing suite as "skipped" because I don't think it's worth recreating them in a library like this - email related stuff, i18n, perforce/svn importers, some of the midx/bitmap stuff - things of that nature. However, for everything that I'm sure is relevant to nearly anyone reading this, the Grit library/CLI can now fully pass the Git test suite.
I think we are talking about ROI in terms of solving real world problems and making real impact, not the fact that a tool has been ported from language X to language Y.
Given the author already admitted that the implementation was slow anyway, you are no better off of using gitoxide instead and that has support for Windows where-as Grit does not.
> Currently both Gitoxide and libgit2's networking functionality is either partial, slow or non-existant. Both GitButler and Jujutsu rely on forking out to Git in order to push or pull data. A big reason for this is the incredibly complicated credential logic involved, but all of this is (theoretically) currently covered in Grit.
> One of the main things I would like to be able to use it for is to be able to bundle complex push/fetch functionality into GitButler and other standalone Git tools needing network functionality (such as Jujutsu).
> Having parts of Git as discrete, embeddable slices of library also enables things like building custom Git servers or client functionality in Rust.
> The full build of all Git functionality in Rust is currently around 27M, but since a large part of it is a library, it could clearly be easily split up into domains of functionality - subcrates that do specific things. Perhaps you could simply use the subset you need.
The article starts out with paragraphs about the history and motivation.
> it made me wonder about the feasibility of using that same approach to accomplish something I've been dreaming about for 15 years now,
> which means that it's difficult to use it in long running processes without fork/exec overhead for everything.
> What if we used the same basic idea that Anthropic used on their from-scratch C compiler? Start a brand new implementation, design it as a Rust library, then throw a swarm of agents at the problem
There is often good reasons for these purposeful digressions. I.e. in nginx the unit tests cover cyphers that are considered unsafe and not supported by modern libraries like rustls https://github.com/rustls/rustls. It is reasonable to make a new implementation and leave behind a bit of baggage.
It depends whether the 0.7% failures are testing deliberately unimplemented features like email or is in corner cases in implemented features. It sounds like it's at least mostly the former, hopefully it's 100% the former.
I don't care if any git I use has email features. IIUC, even most of the people that use git with email don't directly use the email features, they use the patch set features like `git am`. I expect `git am` to work, I don't expect git to actually do email.
In the age of AI, writing things that used to take years can now be done in months or weeks if you have deep enough pockets for it.
Reimplementation is a particularly juicy target because it's easy to test. Imagine someone writing a better browser than Chrome from scratch in just a year.
Because of this moats around business due to difficulty of implementation are effectively gone.
> In looking at the code that the LLMs have produced for the project, especially given the pretty massive and widespread architectural changes needed to make the implementation libified and memory safe, we decided that the codebase is not a derivative work that would require carrying forward the GPL license and have decided to release the code under the MIT instead.
Hmm. That's going to be interesting.
Obligatory: https://github.com/chardet/chardet/issues/327
they would be just wrong. I hope someone with standing sues
I don't think it's that clear cut. The functional parts probably aren't copyrightable, only the stylistic ones. It's going to be a mix of courts applying laws in new ways that hasn't been done before and fact specific questions about what actually persisted through the LLM if it goes to court.
I'd be fascinated to see what happens if it does. Both in the analyses that we'd get of what the LLM did to the codebase and on the legal decisions on what the copyrightable creative elements in code actually are.
If I was the author though... there would be no way that I would be volunteering to be a test case like this. Also seems just rude for no reason.
It probably would have been less bad if he had chosen MPL-2.0 or LGPL-2.1-or-later. But he chose MIT, which cuts at the core of the intent of licensing the project with a share-alike license.
Tell me, can I create a copyrighted video that's not GPL licensed using ffmpeg? Now tell me how creating a rust library using the git test suite is different?
> using the git test suite
That's not actually the case at hand here - the agents were given the original source to reference: https://github.com/gitbutlerapp/grit/blob/main/AGENTS.md#sou...
But for the sake of argument: The test suite itself is copyrighted. To the extent the resulting work is a derivative of the test suite it is possibly infringing. For example you might example that the agent would derive variable names, function names, structure sequence and organization of the code from the test suite. It might even copy comments wholesale. Those are copyrightable things. (Which is of course just the first step in analyzing if it is infringement, there would be interesting fair use, de-minimis copying, etc arguments following a conclusion that any of those were copyrighted. A product produced this way definitely could be infringing given the right facts though).
> That's not actually the case at hand here - the agents were given the original source to reference: https://github.com/gitbutlerapp/grit/blob/main/AGENTS.md#sou...
yeah fair - the "The canonical Git source code we're targeting to replicate the functionality of is in the git/ subdirectory." part makes this hard to argue against.
> To the extent the resulting work is a derivative of the test suite it is possibly infringing
It's this bit that I have a problem with. If I run the test, it fails and reports a failure. Now I write code and run the test again. What is the theory there that code that I wrote infringes.
Simplify this down:
Assume the following is copyrighted:
Does writing the following code: infringe on the test copyright?If you did it in a loop until the test passed, maybe?
Your result is essentially impossible without the original. With ffmpeg, your result does not depend on ffmpeg specifically - you can use any video creation tool.
Writing
Doesn't infringe upon copyright period, because there's no creative element in that work.Imagine a more substantial example though. Perhaps you have a test that checks that some file written in a binary format is correct, and gives names (creative elements) to each field of the format that it prints when you mess up the field, and has comments describing why the bytes are laid out like they are (the comments being copyrightable even if the facts they describe aren't), and the LLM copies those field names and comments verbatim... Now it's quite likely that the LLMs work is a derivative of the test suite.
> Doesn't infringe upon copyright period, because there's no creative element in that work.
There's likely a threshold at some point. It's helpful to look at a minima and then continue from there though.
I'm curious if there's case law that supports your assertions here?
For that assertion in particular I believe I'm practically parroting a ruling by the district court in Oracle vs Google about some extremely simple Java functions that Oracle claimed Google copied. Though I can't say I checked to make sure I'm remembering right.
You're recalling it right, but there's a nice quote from Judge Alsup in that case that talks about this exact situation:
> “So long as the specific code used to implement a method is different, anyone is free under the Copyright Act to write his or her own code to carry out exactly the same function or specification...”
Here given that this is rust and the original expression is C, the implementations cannot be the same by definition.
That's essentially the same thing as modding a game, though. I know there have been lawsuits to stop modding, but I don't think any were successful.
A GPL tool that processes data doesn't virally transfer the license to its output. Copyrighted ffmpeg code isn't incorporated into the video output. The LLM didn't just conjure up equivalent behavior to git without ingesting the code and transforming it as new output. There is no other behavioral description that would reproduce all needed functionality.
Medium, substitutibility, basics of copyright law.
Fair point on medium - this was a lazy example.
Substitutibility probably doesn't apply here in the way you're implying and if it did it would likely be hampered by the 9th circuits findings about transformation in sony v connectix. Arguments here likely would look at rust not having a stable ABI, and hence not being inherently substitutable as a libray (grit-lib), less clear as an executable (grit-cli) on that side
basics of copyright law - the fundamental thing being protected is the expression... is a rust program's expression the same expression as a c program? I'd say generally not.
The test suite could test aspects of the architecture/design of the codebase that are not necessary for interoperability and constitute novel expression of a piece of software in a way that is not at all language specific.
By definition a test suite is about testing interoperability with the test suite. An HTTP test suite should likely test for whether response code 418 is implemented a particular way and while humorous it would still be an interop test no?
No, the git test suite is about testing the git codebase. If you want something like that, you need a conformance suite, which does not exist for git.
functional parts not being copyrightable means that you can't claim a program is a copyright violation based on the fact it does the exact same thing based on compatibility reasons (you can copy what the program does). E.g. git stores refs in .git/refs, so does grit, that's not a violation. You still can't copy the program.
Yes... and now we get to the fact specific question of "did they copy the program". Or actually the answer to that is plainly "no" - they made something similar from it - and didn't run ctrl-c ctrl-v in an unlicensed manner, but "did they copy the relevant facets of the program into the new similar thing".
Making something similar is copying for the purpose of copyright law. If I trace over a Disney character it's still copyright Disney.
No. You're allowed to make a similar tool, the functional elements are not copyrightable. There's a long history, predating LLMs by many decades, of doing this in the software industry.
My use of the word "similar" does not imply here that I think it's obvious that they are "similar" in any copyrightable elements - whether they are or not is one of the interesting questions I think this case would have to resolve.
Incidentally you're also allowed to make similar creative elements so long as they aren't copies and you did so independently... which could actually come up in a case like this (imagine the LLM produced a similar function to some function in the original... but the original wasn't in the context window at the time. Not at all unlikely with code where there often is only one or two natural ways to write something).
If feeding the source code through a complier yields a derivative work, why wouldn't feeding it to an LLM give the same result?
Because compilers and LLMs do different things, and what is done matters, so you can't reason by stepping from one to the other.
Compilers don't axiomatically yield derivative works, they simply in practice do because for non-trivial programs they preserve copyrightable elements of the work in the output.
Well compilers are a mechanical transformation and if that were sufficient to free you of IP law then IP law wouldn't work.
An LLM is also a computer program which takes input and produces output related in some way to that input. However I don't think most people would view it as a "mere" mechanical transformation. One could tautologically argue that an LLM blends the user input with the training inputs which is a sort of transformation and further that the LLM itself is a computer program thus it is mechanical in nature. However it should be immediately obvious that such an overly literal interpretation is in danger of subsuming human work as well. Where the boundary lies is an unanswered question.
Related, compilers can pose a problem depending on what the output includes. For example common lisp compilers that aren't under a permissive license are a minefield because regardless of what anyone might say the image that gets output includes (approximately) the full language implementation verbatim in addition to the user's program.
I suspect that the issue is more likely that the LLM code doesn't have an author and hence some parts of it can't be licenses, it's less likely that it's infringing on git's copyright for various reasons. (I am not a lawyer, but I do read copyright law for funsies).
https://www.copyright.gov/newsnet/2025/1060.html
> It concludes that the outputs of generative AI can be protected by copyright only where a human author has determined sufficient expressive elements. This can include situations where a human-authored work is perceptible in an AI output, or a human makes creative arrangements or modifications of the output, but not the mere provision of prompts.
Well that's interesting.
Also "just" the legal opinion of a government office. It has yet to be tested in court
why wouldn't it? If you run git through a compiler it's still copyright the git devs, same if you run it through an LLM.
What makes you think that's what the article says that it did? There's a lot of specific nuance and it doesn't say that anywhere. In fact it speaks of making a test suite pass only. This is the classic cleanroom bios from specs approach but no need to extract it as the test is available to run and there's nothing in the GPL that suggests that running a test suite infects software that you run it on.
Surely git’s source is already in LLM’s training corpus. So this is far from clean room approach.
I'm not a copyright lawyer, but it seems pretty clear to me you can't wash a license using an LLM.
[US jurisdiction]: Anything in the result written by the LLM can not be copyright by anyone.
Anything in the result written by a human can be, and if it was all emitted by the LLM then that portion originally written by a human carries its own copyright.
As a work of an LLM, the entirety presumably can not be copyright, at all. Portions written by humans presumably carry their original copyright.
> [US jurisdiction]: Anything in the result written by the LLM can not be copyright by anyone.
This is a bit stronger than the actual report where this has been discussed finds. See part 2 in https://www.copyright.gov/ai/ for details, but TL;DR, parts where humans have control over the expression may be copyrightable. But working out which parts those are is likely a difficult question (would likely require proof of provenance across many of those LLM sessions)
Not a fan of this trend of "cleaning" GPL licensed software and releasing under permissive licenses. Also why I'm not a fan of UUtils nor Canonical's early adoption of it in Ubuntu.
The intent here is extraction of all the value provided by copyleft projects without the obligation to give back. Wether it's technically legal or not, it's disgusting behavior IMO.
That’s explicitly not what’s happening with uutils; they have contributed fixes and test cases back to upstream
And just like that, it was forked by Microsoft a few days ago. Handed to them on a silver platter.
> Not a fan of this trend of "cleaning" GPL licensed software > Wether it's technically legal or not, it's disgusting behavior IMO.
GNU was originally developed to "clean" UNIX from the AT&T license.
Knowing what you don't know is such an important skill in life and your career. And I 100% agree with you that the author is, well, off their rocker.
Let me give an example: I could take Goldeneye from the N64, extract the binary and then run it through an LLM to disassemble it and possibly rewrite it in a modern higher-level language. Do you think Nintendo would look at that and say "well, he did a lot of work so he's escaped our license"? Of course not. It's just silly.
ingesting the source code and producing output in another language is quite clearly a derivative work. You don't need to be an IP lawyer to figure that out.
Now, if you went to Calude and gave it documentation and told it to produce something that was compatible, would that be a derivative work and thus covered by the GPL? I would guess probably. But I'm not 100% sure anymore. I wouldn't risk it however.
Here's another thought experiment: what if someone takes this supposedly MIT licensed source tree, plugs it into another LLM and asks it to produce the output in C? Now how is it licensed? It might be very similar. After all, there are only so many ways to produce a SHA1 hash and so many ways to do a command line parser.
But this then makes it an interesting legal issue. In the Oracle v. Google court case, this was a key issue. Google successfully argued there's only so many ways to write a loop so just because a loop is similar to the source, that doesn't mean it's copyright infringement (as Oracle argued).
Anyway, it's a crazy position to take.
Well that is already how it is done with numerous multi-decade open rewrites of closed games. They usually require the asset pack.
I don't know how this squares with law, but Oracle v Google gave a very valuable judgment to the public that an API is not copywritable. If we take the LLM out of it, that's all we are talking about in the pure case.
Of course, we can't take the LLM out, but it is the starting point.
This is simply plagiarism of GPL-licensed code, and license-washing as well. I can understand working backwards from a test suite, but this literally just reads the original source:
https://github.com/gitbutlerapp/grit/blob/main/AGENTS.md#sou...
LLM users seem to live in another world where stealing everything that isn't bolted down, and passing it off as their own work, is acceptable.
I’m all for memory safety and such but honestly what’s the use case for this? Showing off agentic development? In 10+ years git has never failed on a memory overflow or else. Sometimes software is “good as is” and I’m pretty confident git classifies as such. I’ve also never really hit the limitations of git, even with teams of 20+ developers and lots of binary artefacts. You got to really stretch git limitations, in which case you might need to move away from git, and a rust rewrite will not help in any way whatsoever. So again … why?
I addressed this in the post, but Git has no linkable library and never has. If you want to do even something small, you need to fork/exec a process and communicate with it via stdin/out. Or completely reimplement it and all of the edge cases - for example, reading even one object can be either loose (easy) or in a packfile (much more difficult). Reading a reference (what SHA does a branch point to) can be in a loose file, a packfile, or a reftable. etc.
There is no way anyone would ever use this for it's CLI - it will almost certainly always be slower and worse in every way, even if I get it stable (which it's currently not). You can use libgit2 (a project I also helped kickstart), or Gitoxide (a project GitButler also currently helps drive) - they are faster and better in nearly every way, but they are not feature complete.
This isn't for the person using Git. This is for someone trying to build a tool that wants to use parts of Git, which is different.
But libgit2 exists, right? It may not have 100% feature parity with git, but that's a linkable library that gives you a lot of functionality when working with git repos.
the author of this post (whom you were responding to) made `libgit`, the library that preceded `libgit2`, and contributed to libgit2 a long time ago as well. Here he is in 2010 writing about libgit2: https://github.blog/news-insights/libgit2-a-git-linkable-lib...
To be clear, I 100% did not make libgit. I did help the libgit2 project get off the ground.
Nice experiment, but a bit expensive.
I work on Beagle, a git-compatible SCM [1]. I use ABC, Abstractionless C [2] dialect with slices, optional range checking, etc. So far, memory safety was the least of my concerns, frankly. Most of the thorny issues would be equally thorny in Rust (e.g. right now: reflog zeroed when VM ran out of disk space; must be some state machine issue or an OS level glitch). Also, forking off a C process (no runtime) is cheap enough that you actually want to do that more.
But, those are all technicalities. The key issue I see with the approach: the data structures and algos of git have been fanatically fine tuned for that particular application with those particular usage patterns. By very sophisticated low-level C programmers. So, quite likely, any other app/lib working with that store will always be a suboptimal fit. I would recommend read-only access only, esp for LLM code.
Meanwhile, git's underlying data model (blobs/trees/commits) is very simple and very much internet-standard level. Decoupling at that interface is so much easier with so much less issues looming.
May look differently from your vantage point though.
[1]: https://github.com/gritzko/beagle
[2]: https://replicated.wiki/blog/abc
License washing
"Grift"
(The f is for "feft")
How else could they launder the git license and set themselves up for a bait and switch later down the line?
Soon all the crustaceans will realize that C is better because AI can find all the vulnerabilities anyway.
Rust is some ugly poo.
When we go a full year without a lpe in the linux kernel I'll start considering it...
I don’t understand. Gitoxide exists and is great.
It might have missing pieces, but it’s easier to vibecode any needed networking additions to Gitoxide (which is maintained) than to just go and burn tokens trying to clone all of git again.
Git wants to add Rust. Gitoxide is a multi year project that’s going to be more maintained than an ad-hoc “it says it passes the test” vibeclone.
I’m not even against vibecloning things when it’s useful, but this shows no benefits. Git is a beloved tool that few people dislike, it’s not like vinext (people disliking the vendor lock-in they have with nextjs).
Also execs should keep in mind that “we burned thousands of dollars on tokens to re-create this beloved software so we can have our own copy”, even without the copyright/licensing argument, just isn’t something positive that the community will react positively to.
It doesn’t feel nice to see your favourite works cloned for no benefit. We’re past the “it was an experiment to see how far AI can go” stage now.
> It's like giving wishes as a genie. You gotta be super explicit with the ground rules. I have used the genie analogy before. It used to feel more like a Golem but now with the whole Fable sabotage mode https://jonready.com/blog/posts/claude-fable5-is-allowed-to-... it certainly feels more Genie-like.
Previously I described it as "Models give you what you ask, for not what you want". Now with Fable they don't even give you want you want so idk.
I guess software licenses are meaningless now since anyone can decide their llm clone is not derivative.
Currently some act like it is fine to translate a project and change the license.
Recently Casey Muratori said in a adjacent context that the microsoft AI push may be related to the fact that they have a long standing and elaborate codebase. A large historic software company could have advantages to train models. They could provide extra value with their IP.
Now their IP is potentially in their models and accessible to anyone. If they actually train models on their IP, anyone could implement their APIs and slap a GPL license on it.
At that point, things will get very interesting.
A lot of their IP has been leaked over the years anyways. Source code of Windows XP is easily available, and there was the 2022 leak that contained the sources of Bing, Bing Maps, Cortana, etc
No one is training their models on their closed sourced proprietary code. They own github, why would they need to do this.
More is better.
They were already quite meaningless since nearly every FOSS copyright owner doesn't sue violators.
I'd be really interested in the opposite, just for the sake of experimentation since that's what these projects mostly are. They all seem to be rewrites for the sake of "performance", because the cost is now lower bc of AI. I'd be interested to see something like a port of Quake III in Python or Kubernetes in Perl, even Rails in Python would be goofy and really fun to see
> Quake III in Python
Probably doable - I remember most of Natural Selection 2 was Lua and it's more than a decade old at this point.
For Natural Selection 2, it was mainly the gameplay logic that was Lua, all running on their bespoke C++ game engine called Spark. But yeah, modern Python and Lua can be pushed to high performance.
Link: https://unknownworlds.com/en/news/spark-engine-questions-and...
Ah my mistake - I know they ditched Source to pursue their own engine and thought they were going all in on Lua, but doing the whole engine in Lua would have been a struggle I think.
> They all seem to be rewrites for the sake of "performance".
And yet this performs dramatically worse.
A slower, untested, incomplete git implementation, all for the low low price of $10-$15,000.
And don’t forget it wasted a bunch of human time in the process.
So if someone mentioned somewhere else there is already a Rust port a group is doing somewhere. How much could they have accomplished with this much money and time in software development resources?
Ok. AI can seemingly port stuff if you don’t test it thoroughly. I think that’s already been proven. At this point I’m seeing less and less value from these kind of things. I’m sure it was fun for the author, but how does it help other people?
It's not for performance, it's for Rust.
If the first stereotype of Rust programmers is announcing that a project is in Rust before any other desirable software property (e.g. stable, performant, etc), the second stereotype is that Rust programmers love rewriting stuff in Rust, just for the sake of Rust.
(The 2.a. corollary is that they love rewriting GPL projects specifically and downgrading them to MIT/Apache)
But there is already gitoxide, an established git reimplementation in git. It even provides a library
gitoxide was started in 2018, back when we were all writing code by hand, and has some reasonable adoption in the rust ecosystem. It's not feature complete, but if that was the issue then surely fixing that would be better than starting from scratch
It's not for Rust, it's for Library.
Well, it's sort of for Rust. GitButler is written in Rust and Jujutsu is written in Rust and we're both depending on fork/exec'ing to an unknown Git binary with no linkable library and no control over the subprocess to do a range of networking stuff. Neither Gitoxide or libgit2 are capable of this either, as much as I love and support those projects.
This project is entirely about providing a feature complete (even if sloppy) library implementation of Git, which does not otherwise exist.
But... it's memory safe. Not that git has any important memory issue, but now people with skill issues in C can contribute to it without breaking stuff.
They can still break stuff memory safely.
In 15+ years of using Git, I have not had a single crash. What problem are you solving???
This is a problem with people with LLM psychosis who now think they have superpowers, they are completely unaware and just do things naively, they've lost all ability to think for themselves. The LLMs that are thinking for them certainly aren't going to tell them doing X is a bad idea, they're there to produce as many tokens as possible for their owners.
The motivation and rationale is there and was mentioned early on in the article.
I’m all for the hundreds of reasonable objections but this sort of trash mindless critique is as useless as what it denounces.
> A pretty fun experiment and I think we can shape this into something truly useful to the whole community.
Agree with first half of this sentence, we should all have fun with experiments.
> It was never based on a linkable and reentrant library, but instead on a "Unix" philosophy of chaining together simpler commands, which means that it's difficult to use it in long running processes without fork/exec overhead for everything.
Ahhh now we have philosophical disagreement in the only place in the entire article that says "why". Unix is a feature, it's arguably more important in current time: https://aperocky.com/blog/post.html?slug=unix-philosophy-age...
You cut that citation conveniently short.
> It was never based on a linkable and reentrant library, but instead on a "Unix" philosophy of chaining together simpler commands, which means that it's difficult to use it in long running processes without fork/exec overhead for everything.
Added it in full. It still squarely falls under "this is for fun/are you seriously doing this for this purpose" territory for me.
git operate on the filesystem level, the unix behavior is just getting buried. You cannot rewrite git into a linkable library and decide it's now not unix. It's entire behavior is unix, which is why it's awesome.
My intent with this project is not to replace Git in any way. I don't care about the CLI part of this project.
The point is to provide a feature-complete reentrant linkable library. Even if it's an ugly and slow one, this is still the only one thing that exists that covers those points - Gitoxide and libgit2 are both awesome but they are not feature complete.
Isn’t git already just an interface over libgit? How is that different?
Git is famously not built around a (reusable) library, hence why we have things like libgit2 (unrelated to git) and why any porcelain on top of git has to resort to calling the binary and parsing its text output.
I was not aware of that. I knew it was split but didn’t know that split wasn’t useful to others.
Thanks.
libgit.a isn't reentrant. It will call `die()` on many errors. If you link to it in a long running binary, it will kill your process on error.
Libgit2 is meant to address this and I was heavily involved in the development of that project 15 years ago. It's great but it's not feature complete and it's development is also completely separate from git development, so it's out of sync and constantly struggling to keep up.
You're asking people to trust you and hand their codebase/IP to your tool while showing them exactly how you treat other people's code/licenses by "deciding" to not carry forward the GPL license.
I have been working on the same problem in other areas. My ultimate goal is to rewrite nginx in Rust passing as much as the upstream tests as possible while leveraging the strongest aspects of Ruts ecosystem - i.e. rustls (modern memory safe OpenSSL), Tokio (async runtime), h2 (http 2 impl) rather than implementing from scratch like the upstream. I started with Lua, then porting over Valkey, and now working on nginx. The reason was because I wanted to learn the ins and outs before taking on the most complex portion.
[1]. https://github.com/ianm199/lua-rs/tree/main Lua
[2]. https://github.com/ianm199/valdr Valkey/ Redis
[3]. https://github.com/ianm199/nginx-rs-port nginx
Happy to answer any questions on the approach! When I started a few weeks ago the harnesses on their own were not good enough to get very far without a "meta harness" of sorts but that is changing largely with Claude Workloads and Mythos. A lot of the work is developing some custom tooling to move these along faster.
Yeah I got one, why? You aren't learning anything, you are just copying code from other codebases and smashing it together to make some nginx-rust thingie... for what actual goal?
Because C code in production is a ticking time bomb.
Well the biggest goal was to be useful. Nginx serves ~20% of the web, memory unsafe languages might just become untractable for critical exposed to the web infra if the rate of critical CVE's on these rises faster than they can be patched, so a drop in replacement would be a big deal in that world.
But in terms of learning I'm learning relatively little about how to type Rust into an editor but a lot about how to set up agentic loops that can autonomously get tests to pass and improve performance.
For example if you just tell a frontier model (gpt5.5 or Claude Code 4.8) to make some portion of the tests pass they will take forever and just bang their heads against it. I developed a framework to mimic a lot of these tests in nginx... but in minimum non blocking ways so you can run many in parallel with short feedback loops.
Similar for performance - how to make tons of performance benchmark and expose maximum telemetry for agents to go and analyze the hotpaths etc.
You mean you rewrote the nginx test suite with smaller leaner tests ? How did you bootstrap that ? How do you know the leaner tests are equivalent to the real ones ?
Here is an example https://github.com/ianm199/nginx-rs-port/blob/main/crates/ng...
Basically I use these "kits" to prove that the behavior is working as expected with mocked data/ interfaces and then only after these kits pass I'll run the real test suite files as confirmation. So these let you iterate a lot faster than the official test suite because it is very slow.
These are bootstrapped from the real tests.
The other commenter was being a bit dismissive but this is the kind of thing I'm taking away as a real useful pattern to do verification of behavior at scale.
> Nginx serves ~20% of the web, memory unsafe languages might just become untractable for critical exposed to the web infra if the rate of critical CVE's on these rises faster than they can be patched
That is true, however did you actually do any research into nginx? Is it particularly prone to memory bugs?
I honestly don't know the answer but you seem to be coming from a place of C bad, therefore nginx super vulnerable?
In my experience with other web servers the vast majority of security bugs are string handling related (path/header injection), which your rewrite will not protect you from.
https://securityaffairs.com/192132/hacking/nginx-rift-an-18-...
The project was inspired by that. Also unlike most other projects, nginx is directly exposed to the internet often times which makes it more vulnerable than i.e. Redis/ Valkey or something that would be running within a companies network generally.
"C Bad" is a bit reductionist... but I think there is some truth to the take " Until you have the evidence, don’t bother with hypothetical notions that someone can write 10 million lines of C without ubiquitious memory-unsafety vulnerabilities – it’s just Flat Earth Theory for software engineers" [1]
NSA and other government orgs are also pushing people to stop using C [2] for important software.
[1]. https://alexgaynor.net/2020/may/27/science-on-memory-unsafet... [2]. https://linuxsecurity.com/news/government/nsa-s-plea-stop-us...
I think the risks of a rewrite - especially when using AI - are far more problematic than memory safety. In the long run those C projects will be memory safe in the next five years using memory safe C implementations.
My perspective is that it is good to have a beta in a lot of directions.
No one really knows what the endgame of software security looks like.
So some people should try the port to rust angle, some should focus on hardening the C, some should explore more exotic options like formally provable languages etc
There is nothing wrong with trying different things. But the fundamental problem here is that projects and their communities are social projects and need to be to fulfill their purposes and to ensure long term maintenance. In a free software context, rewrites just like forks (1) are fundamentally an asocial (2) activity because they fragment the community (if successful) and then increase overall maintenance burden if not able to replace the original project completely (rarely the case). Disrespect the license choice of the original authors makes this worse.
1) There may be situation were are fork makes sense (e.g. because one project can not serve different use cases well): 2) Which is why usually a "higher goal" is used to justify this, e.g. authors pretend (or lie to themselves, or may be be stupid enough to actually believe this) that some improvement in memory safety is really that important.
You won't get anywhere pressing your case here. This group has already found you guilty, and no argument will change their minds.
You've been caricatured into a blind AI-follower rust-rewriter-just-because type, and that's the surface they'll continually attack (you're wasting time, hurting the community, v2-itis, bikeshedding, premature optimization, copyright violation, moustache-twirling-evil-intent-rug-pull-later, etc etc etc).
Just continue in your work. It's good, and we need people like you.
Buddy, I think it is time to not engage with these models for a bit, you seem to have lost your mind.
We're literally in a thread on converting legacy C projects to idiomatic Rust? It seems many people are working on this same problem.
There are plenty of Rust based reverse proxies out there, why do you need to specifically rewrite nginx? You could also write a config adapter to Caddy, there are a billion options, but this is a wasted effort. The people who want to stick to their nginx configs won't use your project ever, and the people who actually care about security aren't going to use a vibe coded project.
I have no idea why you are making me spell this out, I thought it was pretty obvious.
nit: well-written C projects to legacy Rust
One very strong draw I feel, that's mentioned in this article: Rust's portability, it's ability to be compiled to wasm & run very well anywhere.
This is coming from a cofounder at github, someone who probably knows precisely what the GPL is for. Whatever the legal merits, building on a GPL3 project's complete test suite and relicensing under MIT is not acting in good faith toward the original authors. I really find it disgusting and it makes me want to avoid gitbutler entirely.
I think you're saying that you don't believe in the freedoms to use the GPL licensed test suite for certain purposes which are explicitly allowed by the GPL.
You don't get to choose a license and then add extra terms to it when you don't feel like it's up to scratch. That's something explicitly not allowed by the GPL license.
Where does the GPL say you have the freedom to relicense code or derivatives under MIT by fiat?
Isn’t having to stay under the GPL a very big part of the GPL license?
> Where does the GPL say you have the freedom to relicense code or derivatives under MIT by fiat?
The first part of this sentence (where in the GPL) is unreached if the second part of it is unmet (relicense code or derivatives) which I contend it likely is. You're begging the question.
However:
> The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work
earlier:
> A “covered work” means either the unmodified Program or a work based on the Program.
It's that element that would be difficult to prove "work based on the Program"
Asking an LLM "here's a thing, rewrite it in Rust" is pretty clearly creating either a derivative work or a different form of the same work, just like asking a transpiler would.
There's no evidence that "here's a thing, rewrite it in Rust" is the technique Scott used here.
"here's a test suite, write code in rust that makes that suite pass" is reasonably supported by the article. That would likely not be a derivative work.
It's actually murkier than I suggest here. https://github.com/gitbutlerapp/grit/blob/main/AGENTS.md#sou...
Ew. So it tells the LLM where the git source is for the thing they’re duplicating, but I don’t see instructions saying not to read or copy those files or algorithms.
I could have missed them. I didn’t read everything. I did some quick searches.
But the fact they’re not obvious is kind of troubling. Or that they didn’t just copy the tests and documentation for the LLM and not the source to prevent it from looking would hurt any case they had for clean-room privileges in my eyes, ignoring my other comment with concerns about using the tests at all.
If we assume an u licensed or MIT licensed test suite, an LLM could develop from that and documentation and you’d get something you could license MIT.
IMO, IANAL, etc.
And we’ll ignore the question of what the fact the LLM has certainly seen the git code during training means.
But the test suite would have to stay under the original license. And if you use a GPL test suite as they kernel to develop a program from can you license it non-GPL? I’d question that personally. Same acronyms above apply.
This is the exact thing I'm not sure about. See https://news.ycombinator.com/item?id=48470397 where I posit a simpler question: if a `test_sum()` function is copyrighted, does writing a `sum(a, b)` function infringe on the copyright of the software product that `test_sum()` is a part of. I'd say no. There's another part of the GPL that applies here:
> A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an “aggregate” if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate.
So assuming that sum(a, b) is non-infringing and not combined to form a larger program (i.e. the tests aren't compiled into the grit code), then the GPL explicitly doesn't apply to this use
test_sum is assumedly relatively trivial. So as a lay person I’d expect some sort of obviousness test to apply. Like so much of the stuff in the Google/Oracle lawsuit.
But if you take all the individual tests used to test git as a whole, that seems far more unique. Seems like at that point you’re really having to duplicate the actual git internals, and that seems like it should be covered.
> test_sum is assumedly relatively trivial. So as a lay person I’d expect some sort of obviousness test to apply. Like so much of the stuff in the Google/Oracle lawsuit.
Feel free to extrapolate to the threshold where it's not and at that point apply.
> you’re really having to duplicate the actual git internals
Copyright covers the expression, not the method. So the Rust function:
is distinct from the C function:Does anyone plan to use this?
Similarly, is there any momentum left for Cloudflare's EmDash? I can barely find any discussion after April.
It'd seem weird to plan to use this until the readme stops saying
> it has been nearly entirely written by agents and has not been used for realsies. It's probably currently unusably slow or completely broken in ways that are not exercised in the test suite.
Right now it's someone else's experiment that is still in the "might or might not pan out" stage.
There are a bunch of projects using the similar (not vibe coded, less fully featured) gitoxide project - there is demand for git-as-a-library.
I would not use this except to help us test it if interested. I'm announcing it because it's interesting and a milestone in the breadth of test coverage it can pass. It almost certainly cheated on a bunch of those tests and is not feature complete yet.
The author of gitoxide is also working on GitButler (who worked on this project) and we're pushing both projects forward and actively using and developing Gitoxide as well. This is simply a different and hopefully complimentary approach to the same problem.
I was immediately excited about this wrapped in Python because the current Python git bindings are kind of obtuse, but they do work so I guess I can't complain.
But why switch to this?
Why not just make better Python bindings to libgit?
Wordpress is/was successful because it's braindead and has a solid userbase. I am not to flame WP, but it's a quality to target a specific group of consumers.
It's an organic success, hard to replicate. If at all, CF can only make people migrate with massive effort. Marketing effort, selling lots of snake oil in the process. WP wont just hop on the hot new thing, WP is the definition of the opposite. It works for them. Why change.
Git is the same on the other side. It requires maintenance and improvements, surgical and correct. No git maintainer has time to learn a gigantic new codebase and they will stick with what works for them. For git users there are no advantages. So similarly it would require a long time effort to push the project, building trust that it is somehow better, probably requiring Linus to say "it's great".
> The full build of all Git functionality in Rust is currently around 27M, but since a large part of it is a library, it could clearly be easily split up into domains of functionality - subcrates that do specific things.
I downloaded v0.3.99 for Linux x86_64 and stripped the binary. It ends up at 31 MB. The .text section is 25 MB.
I'm surprised by the large size. On my system /usr/bin/git is 4.7 MB, although git is split up into multiple programs. I'm not comparing apples to apples, but this is weird.
If anyone digs into the binary size, please share what you find.
I would also be interested.
I haven't dug into this at all yet, nor have I tried to optimize the size (or really, anything else).
However, the library part will be less than half of this - a lot of code is spent on the CLI specific stuff and would not be part of the library, which is mostly what I care about for the purposes of this project. The CLI part is just to try to prove the point that it actually does what Git does. The library part is what might be useful in that nothing else exists that does all of the things that it does (provide a reentrant linkable library that is feature complete with Git).
Looking at just the `grit` executable, 58 of the top 200 largest file->resulting code sources are from clap-rs' derive functionality i.e. it's the command-line parsing. The #1 largest is, surprisingly, merge_trees[1] which comes in at 183kiB final binary size. There isn't so much code in that file that it seems reasonable, so it's potentially one of the derives in use (Debug being a common culprit for bloat) that's blowing it out. After those outliers it starts to level out quickly.
Splitting it by crate: `grit` is 13.6MiB, `grit_lib` is 4.8MiB and then it's `std`, `rustls` and `regex_automata` that are the next largest. So as pure library you could hopefully shave off quite a bit of that 25MiB.
[1] https://github.com/gitbutlerapp/grit/blob/main/grit-lib/src/...
What’s the long term strategy for this code base? Does the author expect community code contribution or just bug reports or maybe just test contributions?
In 6 months, seeing no adoption, move the repo to maintenance mode. Archive in 12 months.
I'm happy to take contributions if you want to throw some tokens at it. Bug reports would be amazing, since I haven't tested it for real very much (enough to know you can do basics).
I want to get it to the point where we can replace fork/exec'ing to an unknown Git binary or having said binary be an external dependency for GitButler. The networking stuff (push/fetch) is currently an external dep for both GitButler and Jujutsu (and pretty much every other Git-based tool in the world). I'm pretty sure I can get the project good enough at these networking ops (including all the hairy credential stuff) to be able to not need those fork/exec calls.
He will be probably super happy for starring the project.
The agents did all the work but _somebody_ has to test it for real on their own data to find the edge cases overlooked by AI. That's what users are for nowadays.
Grit was the name of a _Ruby_ implementation of git way back when: https://github.com/mojombo/grit/. I believe it's actually what GitHub was built on then.
I created and named the Grit library that used to power GitHub. Scott Chacon (fellow GitHub cofounder, now CEO of GitButler) specifically asked my permission to re-use Grit as the name of this project, which I gladly granted. R is for Ruby. R is for Rust! Grit is dead. Long live Grit!
I started the project as Gust, but felt like Grit was such a better name. I asked Tom if I could boot the name back up again because I always liked it and he said it was fine.
Also, I worked on the Ruby Grit pretty extensively during the early days of GitHub, so hopefully I earned the right to carry on the mantle. :)
Okay name is taken. Let's rename it to Grift.
if we conquer the universe, i would love to leave one planet alone for rust users. in this planet, the only allowed programming langauge is RUST! everything should be written in RUST
pretty dystopian to ask a robot to recreate your favorite software just so you can relicense it for your business venture
We're choosing a license that is usable by the entire community. Our goal is a linkable library, which makes GPL impossible. If we had chosen to go with LGPL or GPL with linking exception (like libgit2), it would have the same issue of changing the license, so we went with whatever was the most permissive so everyone could use it for anything if they wish. This has nothing to do with business - I hope I can get the project to the point where Jujutsu or whomever can use whatever is valuable here for whatever they want.
We clearly learned from how Git does operations and emulated it in order to function interoperably, the same way that Gitoxide and libgit2 have, and released it under a license that would be the most valuable for people wanting to use a linkable library, the same way that Gitoxide and libgit2 have.
> Our goal is a linkable library, which makes GPL impossible
Not impossible. It forces the code using the library to be under a GPL-compatible license and requires the binary to be released under the GPL license.
The distinction is quite important. It's only impossible in the mind of someone who wants to release proprietary software. Even for people releasing software under permissive license it's not impossible, just highly inconvenient (and the LGPL is always an option in this case).
>We're choosing a license that is usable by the entire community.
What a weaselly way to put it.
A GPL library, as I'm sure you know, is perfectly usable by anyone including jujutsu and anyone else. They just have to also license under the GPL and this is no barrier to open source projects.
I did something similar and called it gitredoxide since I started with gitoxide.
Theres already git-in-rust project that is making good progress
https://github.com/gitoxidelabs/gitoxide
Gitoxide is also developed primarily by Byron, who also is part of the GitButler team. We're pushing both projects forward.
Gitoxide is mentioned in this write up, yes,
> Currently both Gitoxide and libgit2's networking functionality is either partial, slow or non-existant. Both GitButler and Jujutsu rely on forking out to Git in order to push or pull data. A big reason for this is the incredibly complicated credential logic involved, but all of this is (theoretically) currently covered in Grit.
So, they "decided" it's not a derivative and thus can be listened under MIT instead of GPL....
Yeah, that's usually how contracts work.
You decide whether you have followed it or not. The other party will decide if they agree. If in dispute, you go to a judge and they decide also.
a lot of things are just "decided" really.
it's just in this case it's the author. we'll have to wait and see who decides to challenge it
I continue to be surprised by the lack of understanding around copyright law when it comes to AI.
they still haven't explained why I should bother. Is it faster, easier, more efficient, more capable, more scalable on large codebases, supports better workflows?
In fact, I would rather it stay C for 15 more years.
I'm assuming you didn't read the article, since I'm pretty sure I covered all of this, but I'm happy to respond.
Don't bother.
It's probably not for you. It's slower, more obtuse, more bloated, less capable, exponentially less scalable at any size. Canonical Git is better in every way, except being a linkable library.
Even in the arena of being linkable libraries that can do Git stuff, both Gitoxide (Rust) and libgit2 (C which has git2 crate Rust bindings) are both better, they're just not feature complete. That is the only point of this project.
I hate all you llm people, you're ruining everything, shoving slop down everyone's mouth and telling us you now have superpowers. No you're stealing and making everything as shitty as possible in the process.
I pray everything switches to usage based billing and the curtains can close on this era.
> the result is Grit, a from-scratch, library-based, memory-safe, idiomatic Rust reimplentation of Git that passes over 99% of the entire Git test suite.
Why not 100%?
> It's not actually passing every single test, though that is on purpose. I did mark some parts of the testing suite as "skipped" because I don't think it's worth recreating them in a library like this
> 41,715 / 42,001 tests passing (99.3%)
So it is not entire then but somehow that was worth burning $8,000~ dollars worth of tokens?
> Why not 100%?
From the article
> It's not actually passing every single test, though that is on purpose. I did mark some parts of the testing suite as "skipped" because I don't think it's worth recreating them in a library like this - email related stuff, i18n, perforce/svn importers, some of the midx/bitmap stuff - things of that nature. However, for everything that I'm sure is relevant to nearly anyone reading this, the Grit library/CLI can now fully pass the Git test suite.
So .7% tests fail therefor it was 100% a waste of time?
I think we are talking about ROI in terms of solving real world problems and making real impact, not the fact that a tool has been ported from language X to language Y.
Given the author already admitted that the implementation was slow anyway, you are no better off of using gitoxide instead and that has support for Windows where-as Grit does not.
> Currently both Gitoxide and libgit2's networking functionality is either partial, slow or non-existant. Both GitButler and Jujutsu rely on forking out to Git in order to push or pull data. A big reason for this is the incredibly complicated credential logic involved, but all of this is (theoretically) currently covered in Grit.
Regardless, what's the point?
> One of the main things I would like to be able to use it for is to be able to bundle complex push/fetch functionality into GitButler and other standalone Git tools needing network functionality (such as Jujutsu).
> Having parts of Git as discrete, embeddable slices of library also enables things like building custom Git servers or client functionality in Rust.
> The full build of all Git functionality in Rust is currently around 27M, but since a large part of it is a library, it could clearly be easily split up into domains of functionality - subcrates that do specific things. Perhaps you could simply use the subset you need.
The article starts out with paragraphs about the history and motivation.
> it made me wonder about the feasibility of using that same approach to accomplish something I've been dreaming about for 15 years now,
> which means that it's difficult to use it in long running processes without fork/exec overhead for everything.
> What if we used the same basic idea that Anthropic used on their from-scratch C compiler? Start a brand new implementation, design it as a Rust library, then throw a swarm of agents at the problem
maybe it's an academic project. proof they could reimplement something useful & complex?
There is often good reasons for these purposeful digressions. I.e. in nginx the unit tests cover cyphers that are considered unsafe and not supported by modern libraries like rustls https://github.com/rustls/rustls. It is reasonable to make a new implementation and leave behind a bit of baggage.
The author actually estimated $10-$15,000 worth of tokens.
It depends whether the 0.7% failures are testing deliberately unimplemented features like email or is in corner cases in implemented features. It sounds like it's at least mostly the former, hopefully it's 100% the former.
I don't care if any git I use has email features. IIUC, even most of the people that use git with email don't directly use the email features, they use the patch set features like `git am`. I expect `git am` to work, I don't expect git to actually do email.
In the age of AI, writing things that used to take years can now be done in months or weeks if you have deep enough pockets for it.
Reimplementation is a particularly juicy target because it's easy to test. Imagine someone writing a better browser than Chrome from scratch in just a year.
Because of this moats around business due to difficulty of implementation are effectively gone.
> can now be done in months or weeks if you have deep enough pockets for it.
Especially if there's the same thing that already exists in open source that the model can plagiarize for you.