Has any progress been made on this? I remember seeing this proposal 3 or 4 years ago but it looks like it still hasn't been implemented. It's a shame because it seems like a useful feature. It looks like Microsoft has something similar (https://learn.microsoft.com/en-us/cpp/code-quality/understan...) but it would be nice to have something that worked on other platforms.
“-fbounds-safety is a language extension to enforce a strong bounds safety guarantee for C. Here is our original RFC.
We are thrilled to announce that the preview implementation of -fbounds-safety is publicly available at this fork of llvm-project. Please note that we are still actively working on incrementally open-sourcing this feature in the llvm.org/llvm-project . To date, we have landed only a small subset of our implementation, and the feature is not yet available for use there. However, the preview does contain the working feature. Here is a quick instruction on how to adopt it.”
As I and others noted below, it is included in Apple's clang version, which is what you get when you install the command line tools for Xcode. Try something like:
clang -g -Xclang -fbounds-safety program.c
Bounds check failures result in traps; in lldb you get a message like:
Microsoft's SAL annotations are meant to inform the static analyzer how the parameters are meant to be used so any violations of the contract can be diagnosed at compile time. The LLVM proposal is different in that it is checked at run time and will stop your program before it makes an out of bounds access. Static analyzers can obviously use the information in the type to help diagnose a subset of such problems at compile time.
Niklaus Wirth died in 2024, and yet I hope he is having a major I-told-you-so moment about people blaming Pascal's bounds checking to be unneeded and making things slow.
My CS college used Turbo Pascal as a teaching language. I had a professor who told us "don't turn the range and overflow checking off, even when compiling for production". That turned out to be very wise advice, IMHO. Too bad C and C++ compiler/language designers never got that message. So much wasted to save that less than 1% performance gain.
To this day, FPC uses less ram than any C compiler, A good thing in today's increasingly ramless world and they've managed this with way less developers working on it than its C compiler equivalent, I can't even imagine what it would look like if they had the same amount of people working on it. C optimization tricks are hacks, the fact godbolt exists is proof that C is not meant to be optimizable at all, it is brute force witchcraft.
At a certain point though, something's gotta give, the compiler can do guesswork, but it should do no more, if you have to add more metadata then so be it it's certainly less tedious than putting pragmas and _____ everywhere, some C code just looks like the writings of an insane person.
> […] C optimization tricks are hacks, the fact godbolt exists is proof that C is not meant to be optimizable at all, it is brute force witchcraft.
> At a certain point though, something's gotta give, the compiler can do guesswork, but it should do no more, if you have to add more metadata then so be it it's certainly less tedious than putting pragmas and _____ everywhere, some C code just looks like the writings of an insane person.
There is not even a single correct or factual statement in cited strings of words.
C optimisation is not «hacks» or «witchcraft»; it is built on decades of academic work and formal program analysis: optimisers use data-flow analysis over lattices and fixed points (abstract interpretation) and disciplined intermediate representations such as SSA, and there is academic work on proving that these transformations preserve semantics.
Modern C is also deliberately designed to permit optimisation under the as-if rule, with UB (undefined behaviour) and aliasing rules providing semantic latitude for aggressive transformations. The flip side is non-negotiable: compilers can't «guess» facts they can't prove, and many of the most valuable optimisations require guarantees about aliasing, alignment, loop independence, value ranges, and absence of UB that are often not derivable from arbitrary pointer-heavy C, especially under separate compilation.
That is why constructs such as «restrict», attributes and pragmas exist: they are not insanity, they are explicit semantic promises or cost-model steering that supply information the compiler otherwise must conservatively assume away.
«metadata instead» is the same trade-off in a different wrapper, unless you either trust it (changing the contract) or verify it (reintroducing the hard analysis problem).
Godbolt exists because these optimisations are systematic and comparable, not because optimisation is impossible.
Also, directives are not new, C-specific embarrassment: ALGOL-68 had «pragmats» (the direct ancestor of today’s «pragma» terminology), and PL/I had longstanding in-source compiler control directives, so this mechanism is decades older than and predates modern C tooling.
There's a blog post from Google about this topic as well where they found that inserting bound checking into standard library functions (in this case C++) had a mere 0.3% negative performance impact on their services: https://security.googleblog.com/2024/11/retrofitting-spatial...
> As local variables are typically hidden from the ABI, this approach has a marginal impact on it.
I'm skeptical this is workable... it's pretty common in systems code to take the address of a local variable and pass it somewhere. Many event libraries implement waiting for an event that way: push a pointer to a futex on the stack to a global list, and block on it.
They address it explicitly later:
> Although simply modifying types of a local variable doesn’t normally impact the ABI, taking the address of such a modified type could create a pointer type that has an ABI mismatch
That breaks a lot of stuff.
The explicit annotations seem like they could have real value for libraries, especially since they can be ifdef'd away. But the general stack variable thing is going to break too much real world code.
I don't understand this example: you're taking an address of local-scope stack object, storing it into a global list, and then use this address elsewhere in the code, possibly at different time-point, to manipulate with the object? I am obviously missing something because this cannot work unless this object lives on the stack of main().
The best example I know of off the top of my head is wait_event() in Linux.
So long as the thread is guaranteed not to exit while blocked, you know its stack, and therefore the object allocated on it, must continue to exist. So, as long as there is no way to wake the thread except by kicking that object, the memory backing it is guaranteed to continue to exist until that object is kicked. You do have to somehow serialize the global data structure lookup (e.g. lock/dequeue/unlock/kick), if multiple threads can find and kick the object concurrently that's unsafe (the thread might exit between the first and subsequent kicks).
Generally that's true, even in pthread userspace: while there are some obvious artificial counterexamples one can construct, real world code very rarely does things like that.
Ok, I see, thanks for the example. Is this technique used to avoid the potential runtime performance cost because one would otherwise need to keep that object elsewhere/heap and not on a stack? Or is the problem definition something else?
It's just mechanically simpler that way. If the wakee thread dynamically allocated the object, it would have to free it after being woken: may as well let the compiler do that automatically for us.
Yep, it's a straight up error in C to return the address of a local variable from a function outside of main. Valgrind will flag this as use of an uninitialized value.
The problem is that as long as it's something where the calling function checks it immediately after the function exits and never looks again (something like an error code or choosing a code path based on the result) they often get away with it, especially in single threaded code.
I'm running into this at this very moment as I'm trying to make my application run cleanly, but some of the libraries are chock full of this pattern. One big offender is the Unix port of Microsoft's ODBC library, at least the Postgre integration piece.
I also blame the Unix standard library for almost having this pattern but not quite. Functions that return some kind of internal state that the programmer is told not to touch. Later they had to add a bunch of _r variants that were thread safe. The standard library functions don't actually have this flaw due to how they define their variables, but from the outside it looks like they do. It makes beginning programmers think that is how the functions should work and write their code in a similar manner.
> Yep, it's a straight up error in C to return the address of a local variable from a function
Sure, that's true, but nobody is suggesting returning the address of a local variable anywhere in this thread.
I'm describing putting a pointer to a local variable in a global data structure, which is safe so long as the function doing it is somehow guaranteed not to return until the pointer is removed from the global data structure.
I would imagine variables that are passed to functions would be considered ABI-visible. If the compiler is smart enough, it can keep the pointer wide when it’s passed to a function that’s also being compiled and act accordingly on the other side, but that worries me because this new meaning of “pointer” is propagating to parts of the code that might not necessarily agree with it.
>Pizlix is LFS (Linux From Scratch) 12.2 with some added components, where userland is compiled with Fil-C. This means you get the most memory safe Linux-like OS currently available.
I'm aware of Pizlix - it's a good project/idea that needs to go mainstream; as you mention, memory safety is currently limited to userland (still a huge improvement over traditional unsafe userland.)
Note also that it uses fil-c rather than clang with -fbounds-safety. I believe fil-c requires fewer code changes than -fbounds-safety.
https://github.com/hsaliak/filc-bazel-template i created this recently to make it super easy to get started with fil-c projects. If you find it daunting to get started with the setup in the core distribution and want a 3-4 step approach to building a fil-c enabled binary, then try this.
You need to annotate your program with indications of what variable tracks the size of the allocation. So, sure, but first work on the packages in the distro.
Note that corresponding checks for C++ library containers can be enabled without modifying the source. Google measured some very small overhead (< 0.5% IIRC) so they turned it on in production. But I'd expect an OS distro to be mostly C.
The feature is only on SPARC, not x86. Oracle killed in-house SPARC development in 2017, and they abandoned OpenSPARC after they acquired Sun, so it's effectively a dead architecture. The software won't work without the hardware to run it on.
Finally, it is up to Intel and AMD to come up with hardware memory tagging, so far they have messed up all attempts, with MPX being the last short lived one.
It's good info, and I wouldn't rush a migration off of SPARC systems if I was already using them, but slow death is still death. It was already worrying that workstations were killed off by Sun before the Oracle acquisition; it seems quite clear that no one has been serious about spreading adoption of the architecture for more than two decades now.
The internet didn't go down and you're mischaracterizing it as a parsing issue when the list would've exceeded memory allocation limits. They didn't hardcode a fallback config for that case. What memory safety promise did Rust fail there exactly?
A panic in Rust is easier to diagnose and fix than some error or grabage data that was caused by an out of bounds access in some random place in the call stack
I probably don’t have such an exploit, since you’re probably running something up to date. There have been many in the past. I doubt the last one to be fixed is the last one to exist.
If your attitude is that getting exploited doesn’t matter because your software is unprivileged, you need some part of your stack to be unexploitable. That’s a tall order if everything is C.
You can get exploitable code out of any compiler. But you’re far more likely to get it from real-world C than real-world Rust.
> you need some part of your stack to be unexploitable.
Kernel level process isolation is extremely robust.
> If your attitude is that getting exploited doesn’t matter because your software is unprivileged
It’s not that exploits doesn’t matter. It’s that process architecture is a stronger form of guarantee than anything provided by a language runtime.
I agree that the place where rust is most beneficial is for programs that must be privileged and that are likely to face attack - such as a web server.
But the idea that you can’t securely use a C program in your stack or that rust magically makes process isolation irrelevant is incorrect.
How can process architecture be a stronger guarantee than anything provided by a language runtime when it is enforced by software written in a language?
You have a process receiving untrusted, potentially malicious input from the outside. If there’s an exploit then an attacker can potentially take control of the process. Your process is isolated, that’s good. But it can still communicate with other parts of your system. It can make syscalls. Now you’re in the same situation where you have a program receiving untrusted, potentially malicious input from the outside, but now “the outside” is your subverted process, and “a program” is the kernel. The same factors that make your program difficult to secure from exploits if it’s written in C also apply to the kernel.
I’m not sure where those ideas as the end of your comment came from. I certainly didn’t say them.
> The Linux kernel has always traditionally been compiled with GNU toolchains such as GCC and binutils. Ongoing work has allowed for Clang and LLVM utilities to be used as viable substitutes. Distributions such as Android, ChromeOS, OpenMandriva, and Chimera Linux use Clang built kernels. Google’s and Meta’s datacenter fleets also run kernels built with Clang.
this is amazing, counter to what most ppl think, majority of memory bugs are from out of bounds access, not stuff like forgetting to free a pointer or some such
Personally, as someone in C and C++ for the last few years, memory access is almost never the root bug. It's almost always logic errors. Not accounting for all paths, not handling edge cases, not being able to handle certain combinations of user or file input, etc.
Occasionally an out-of-bounds access pops up, but they're generally so blindingly obvious and easy to fix that it's never been the slow part of bug fixing.
The wider industry data gathered indicates that for memory unsafe languages 80% of issues are due to memory vulnerabilities, including mature codebases like Linux kernel, curl, V8, Chrome, Mach kernel, qemu etc etc etc. This doesn’t mean that logic bugs are less common, it just means that memory safety issues are the easiest way to get access.
As for why your experience may be different, my hunch is that either your code was super simple OR you didn’t test it thoroughly enough against malicious/unexpected inputs OR you never connected the code to untrusted I/O.
Keep in mind the data for this comes from popular projects that have enough attention to warrant active exploit research by a wide population. This is different from a project you wrote that doesn’t have the same level of attention.
> The wider industry data gathered indicates that for memory unsafe languages 80% of issues are due to memory vulnerabilities, including mature codebases like Linux kernel, curl, V8, Chrome, Mach kernel, qemu etc etc etc.
You are misremembering the various reports - the reports were not that 80%[1] of issues were due to memory errors, but more along the lines of 80% of exploits were due to memory errors.
You could have 1000 bugs, with 10 of them being vulnerabilities, and 8 of those 10 being due to memory errors, and that would still be in line with the reports.
> As for why your experience may be different, my hunch is that either your code was super simple OR you didn’t test it thoroughly enough against malicious/unexpected inputs OR you never connected the code to untrusted I/O.
Payments processing, telecoms and munitions control software.
Of those, your explanation only applies to Telecoms; payments processing (EMV) was basically a constant stream of daily attacks, while munitions are live, in the field, with real explosives. We would've noticed any bugs, not just memory error bugs with the munitions one.
Sorry, I didn’t misremember but I wrote down without proof checking (see another comment where I got it right). I did indeed mean 80% of security vulnerabilities are caused by memory safety issues.
For EMV you had C connected directly to the network under a steady stream of attacks and only had an issue once? I find that hard to believe. What’s more likely is a Java websever frontend talking to some C processing / crypto code in which case again you’re less likely to encounter bugs in your code because it’s difficult to find a path to injecting unsanitized input.
For munitions there’s not generally I/O with uncontrolled input so it’s less likely you’d find cases where you didn’t properly sanitize inputs and relied on an untrusted length to access a buffer. As a famous quote states, it’s ok if your code has an uptime of 3 minutes until the first bug if the bomb explodes in 2
> For EMV you had C connected directly to the network under a steady stream of attacks and only had an issue once? I find that hard to believe. What’s more likely is a Java websever frontend talking to some C processing / crypto
EMV terminals. No Java involved.
> As a famous quote states, it’s ok if your code has an uptime of 3 minutes until the first bug if the bomb explodes in 2
Look, first you commented that it's not possible for nontrivial or non-networked devices, now you're trivialising code that, if wrong, directly killed people!
All through the 80s, 90s and 2000s (and even now, believe it or not), the world was filled with millions and millions of devices programmed in C, and yet you did not live a life where all the devices around you routinely crashed.
Crs, Microwaves, security systems... they didn't routinely crash even though they were written in C.
EMV terminals are not under daily cybersecurity attack - you need to have physical access unless you designed your system weirdly. You probably had loads of vulnerabilities. But also depending on when you did it, all you had to process was a bar code which is also isn’t some super complicated task.
I’m not trivializing the safety of munitions. I’m attempting to highlight that safety and stability in a munitions context is very different and memory safety issues could easily exist without you realizing. My overall point is that you are silently making the argument that C programmers (or programmers in general) used to be better, which is a wild argument to be making about a culture in which fuzzing didn’t even exist as a concept. You’re also confusing memory safety with implying a crash. That simply isn’t the case - it’s more often exploitable as a security vulnerability than an immediate crash by violating assumptions made that weren’t in the happy path of those microwaves and security systems. That millions of devices were and still are routinely exploitable.
You’re also making a fallacious line of reasoning that the C today is the same C that was in use in the 80s, 90s, and 2000s. It’s not and has gotten harder and more dangerous because a) multi threading became more of a thing and b) compiler authors started exploiting “undefined” behaviors for extra optimization.
It’s just wild for me to encounter someone who believes C is a safe language and is suitable to connect to I/O too when there’s so many anecdotal and wide statistical evidence gathered that that’s not the case. Even SQLite, the darling of the C community, is not safe if asked to open arbitrary SQLite files - there’s various security attacks known and possible.
> EMV terminals are not under daily cybersecurity attack - you need to have physical access unless you designed your system weirdly.
They are under daily attack - in public, at tills, operated by minimum-wage earners.
> You probably had loads of vulnerabilities.
Sure. Hundreds of thousands of terminals sitting in the field, networked, under the control of minimum wage employees, each holding credit card details for hundreds of cards at a time...
Yeah, you're right, not a target at all!
> But also depending on when you did it, all you had to process was a bar code which is also isn’t some super complicated task.
You are hopelessly naive. Even in the magstripe era, certification was not easy.
> It’s just wild for me to encounter someone who believes C is a safe language
When did you meet this person?
Look, the bottom line is, the errors due to memory safety in programs written in C is so small it's a rounding error. It's not even statistical noise. You spent your life surrounded by these programs that, if they went wrong, would kill you, and yet here you are, not only arguing from a place of ignorance, you are reveling in it.
Just out of interest, have you ever used an LLM to write code for you?
Physical attacks are difficult to pull off at scale, especially anonymously. There’s a huge evidence trail linking the people involved to the scheme. And a device being in the hands of a minimum wage employee is very different from a bored and talented and highly skilled person probing your software remotely. Now who’s naive?
As for certification and it being difficult, what does that have to do with the process of bread in Paris? Unless you’re somehow equating certification with a stamp of vulnerability imperviousness in which case you’re seeing your own naivete instead of in others. Btw, Target was fully certified and fully had their payment system breached. Not through the terminals but through the PoS backend. And as for “but you’re here living and breathing”, there’s constant security breaches through whatever hole, memory safety or otherwise. Persistent access into the network is generally only obtainable through credential compromise or memory safety.
> When did you meet this person?
You. You’re here claiming that memory safety issues are statistical noise yet every cloud software I’ve seen deployed regularly had them in the field, sometimes even letting a bad one through to canary. And memory safety issues persisted despite repeated attempts to fix issues and you couldn’t even know if it was legitimately an issue or just a HW flaw due to being deployed at scale enough that you were observing bad components. It’s a real problem and claiming it’s statistical noise ignores the consequences of even one such issue being easily accessible.
Yes. The problem is that most memory errors (out of bounds + use after free etc.) result in a vulnerability. Only a minority of the logic errors do.
For operating systems kernels, browsers etc, vulnerabilities have a much, much bigger impact than logic errors: vulnerabilities need to be fixed immediately, and released immediately. Most logic errors don't need to be fixed immediately (sure, it depends on the issue, and on the type of software.)
I would probably say "for memory unsafe languages, 80% of the _impact_ is due to memory vulnerabilities"
logic errors aren't memory errors, unless you have some complex piece of logic for deallocating resources, which, yeah, is always tricky and should just generally be avoided
"Majority" could mean a few things; I wouldn't be surprised if the majority of discovered memory bugs are spatial, but I'd expect the majority of widely exploited memory bugs to be temporal (or pseudo-temporal, like type confusions).
table stakes, but people still mess up on it constantly. The "yeah, but that's only a problem if you're an idiot" approach to this kind of thing hasn't served us very well so it's good to see something actually being done.
Trains shouldn't collide if the driver is correctly observing the signals, that's table stakes too. But rather than exclusively focussing on improving track to reduce derailments we also install train protection systems that automatically intervene when the driver does miss a signal. Cause that happens a lot more than a derailment. Even though "pay attention, see red signal? stop!" is conceptually super easy.
I'm not saying it's not important, it is. I just don't believe that '[the] majority of memory bugs are from out of bounds access'. That was maybe true 20 years ago, when an unbounded strcpy to an unprotected return pointer on the stack was super common and exploiting this kind of vulnerabilities what most vulndev was.
This brings C one tiny step closer to the state of the art, which is commendable, but I don't believe codebases which start using this will reduce their published vulnerability count significantly. Making use of this requires effort and diligence, and I believe most codebases that can expend such effort already have a pretty good security track record.
The majority of security vulnerabilities in languages like C that aren’t memory safe are due to memory safety issues like UAF, buffer overflows etc etc. I don’t think I’ve seen finer grained research that tries to break it out by class of memory safety issue. The data is something like 80% of reported vulnerabilities in code written in these languages are due to memory safety issues. This doesn’t mean there aren’t other issues. It just means that it’s the cheapest exploit to search for when you are trying to break into a C/C++ service.
And in terms of how easy it is to convert a memory safety issue into an exploit, it’s not meaningfully much harder. The harder pieces are when sandboxing comes into play so that for example exploiting V8 doesn’t give you arbitrary broader access if the compromised process is itself sandboxed.
actually you may be right, according to project zero by google [1], ~50% is use after free and only ~20% for out of bounds errors, however, this is for errors that resulted in major exploits, i'm not sure what the overall data is
Exciting! It doesn't imply that we should now sprinkle the new annotations everywhere. We still should keep working with proper iterators and robust data structures, and those would need to add such annotations.
> To tackle this issue, the model incorporates the concept of a “wide pointer” (a.k.a. fat pointer) – a larger pointer that carries bounds information alongside the pointer value.
Bounds checking with fat pointers existed as a set of patches for GCC in the early 2000's. (C front end only).
Amazing, this is a life saving feature for C developers. Apparently it's not complete yet? I will apply this to my code once the feature is included on LLVM and GCC.
Would be nice if the annotations could also be applied to structure fields.
That's amazing. Thanks for that reference. If it's good enough for the kernel, then it's good enough for me to start using in my own projects.
It's really cool that the kernel is using this. The compiler must be generating simple bounds checking code with traps instead of crazy stuff involving magical C standard library functions. Perfect for freestanding nostdlib projects.
I don't use much C but if you add them to the standard they'll probably trickle down to C++ compilers by 2045 and I'll have a good 10 years to use them before I retire.
Well, I'll take them over the nothing you're giving me :D
But in all seriousness, I want this:
struct Result { int value; int err; };
#define TRY(res) \
({ \
Result _res = res; \
if (failed(res)) return res; \
res.value; \
})
Result f1(...) { ... }
Result f2() {
int res = TRY(f1(...)); // <<<<
...
return success();
}
Can't be done with lambdas since the macro needs to return out of the actual function, not the lambda. Pretty much rust question mark, but without rust, or zig "try" but without zig.
I see, thanks! There is general consensus that statement expressions should become part of ISO C, but some lack of time to get it done. I am not part of WG21 though, so can't say anything about C++.
You can use pointer types by using a typedef first, but I agree this not nice (I hope we will fix this in future C). But then, I think this is a minor inconvenience for having an otherwise working span type in C.
Even better, starting with C++26, and considered to be done with DR for previous versions, hardned runtimes now have a portable way to be configured across compilers, instead of each having their own approach.
However, you still need something like -fbounds-safety in C++, due to the copy-paste compatibility with C, and too many people writing Orthodox C++, C with Classes, Better C, kind of code, that we cannot get rid of.
Only if not able to do import std, or pre-compiled headers, and not using modern IDEs with "just my code" filters.
As someone that enjoys C++ since 1993, alongside other ecosystems, many pain points on using C++ complaints are self inflicted, by avoiding using modern tools.
Heck, C++ had nice .NET and Java alike frameworks, with bounds checking even, before those two systems came to exist, and nowadays all those frameworks are mostly gone with exception of Qt and C++ Builder ones, due to bias.
That's an objectively correct statement, but I don't see how it makes sense as a response to my comment, as I'm advocating to use the more advanced feature-rich tool over the compiler-specific-hacks one.
If you're advocating switching languages, then there's no reason to stop at C++. It's more common to propose just converting the universe to Rust, but assembly also enjoys the possibility of being fairly easy to drop in on an existing C project.
I looked at trying to implement -fbounds-safety and -Wunsafe-buffer on a reasonably large codebase (4,000 C and C++ files), and it's basically impossible.
You have to instrument every single file. It can be done in stages though. Just turn the flag on one-by-one for each file. The xnu kernel is _mostly_ instrumented with -fbounds-safety.
Plug: In theory you could auto-convert to a memory-safe subset of C++ as a build step. Auto-converted code would have some run-time overhead, but you can mark any performance-sensitive parts of the code to be exempt from conversion. And you get lifetime and type safety too. For full coverage, performance-sensitive parts of the code can be manually converted to the safe subset to minimize overhead. (Interfaces in extern C blocks remain unconverted by default to maintain ABI compatibility.)
This sounds like the kind of low-thought pattern-based repetitive task where you could tell an LLM to do it and almost certainly expect a fully correct result (and for it to find some bugs along the way), especially if there's some test coverage for it to verify itself against. If you're skeptical, you could tell it to do it on some files you've already converted by hand and compare the results. This kind of thing was a slam dunk for an LLM even a year or two ago.
Because it can only catch a subset of issues, it’s not guaranteed to catch issues (probabilistic), even issues it “could” catch may not be caught due to temporal distance of the free and a subsequent use, and requires the use of a different allocator that supports it. It’s also unclear to me how it know whether a given free is for a sampled or unsampled region - I suspect it must capture all free/realloc to accomplish that but it does imply all of these are sampled.
It’s nowhere near the same as robust bounds checking.
The real question is adoption friction. The annotation requirement means this won't just slot into existing codebases — someone has to go through and mark up every buffer relationship. Google turning on libcxx hardening in production with <0.5% overhead is compelling precisely because it required zero source changes.
The incremental path matters more than the theoretical coverage. I'd love to see benchmarks on a real project — how many annotations per KLOC, and what % of OOB bugs it actually catches in practice vs. what ASAN already finds in CI.
The WebKit folks have apparently been very successful with the annotations approach[0]. It's a shame that a few of the loudest folks in WG21 have decided that C++ already has the exact right number of viral annotations already, and that the language couldn't possibly survive this approach being standardized.
Has any progress been made on this? I remember seeing this proposal 3 or 4 years ago but it looks like it still hasn't been implemented. It's a shame because it seems like a useful feature. It looks like Microsoft has something similar (https://learn.microsoft.com/en-us/cpp/code-quality/understan...) but it would be nice to have something that worked on other platforms.
https://discourse.llvm.org/t/the-preview-of-fbounds-safety-i...:
“-fbounds-safety is a language extension to enforce a strong bounds safety guarantee for C. Here is our original RFC.
We are thrilled to announce that the preview implementation of -fbounds-safety is publicly available at this fork of llvm-project. Please note that we are still actively working on incrementally open-sourcing this feature in the llvm.org/llvm-project . To date, we have landed only a small subset of our implementation, and the feature is not yet available for use there. However, the preview does contain the working feature. Here is a quick instruction on how to adopt it.”
“This fork” is https://github.com/swiftlang/llvm-project/tree/stable/202407..., Apple’s fork of LLVM. That branch is from a year ago.
I don’t know whether there’s a newer publicly available version.
There is a GSoC 2026 opportunity on upstreaming this into mainline LLVM (https://discourse.llvm.org/t/gsoc-2026-participating-in-upst...)
Apple is shipping code built with this, and is supporting it for developers to use (see https://developer.apple.com/documentation/xcode/enabling-enh...)
As I and others noted below, it is included in Apple's clang version, which is what you get when you install the command line tools for Xcode. Try something like:
Bounds check failures result in traps; in lldb you get a message like:Microsoft's SAL annotations are meant to inform the static analyzer how the parameters are meant to be used so any violations of the contract can be diagnosed at compile time. The LLVM proposal is different in that it is checked at run time and will stop your program before it makes an out of bounds access. Static analyzers can obviously use the information in the type to help diagnose a subset of such problems at compile time.
Niklaus Wirth died in 2024, and yet I hope he is having a major I-told-you-so moment about people blaming Pascal's bounds checking to be unneeded and making things slow.
My CS college used Turbo Pascal as a teaching language. I had a professor who told us "don't turn the range and overflow checking off, even when compiling for production". That turned out to be very wise advice, IMHO. Too bad C and C++ compiler/language designers never got that message. So much wasted to save that less than 1% performance gain.
To this day, FPC uses less ram than any C compiler, A good thing in today's increasingly ramless world and they've managed this with way less developers working on it than its C compiler equivalent, I can't even imagine what it would look like if they had the same amount of people working on it. C optimization tricks are hacks, the fact godbolt exists is proof that C is not meant to be optimizable at all, it is brute force witchcraft.
At a certain point though, something's gotta give, the compiler can do guesswork, but it should do no more, if you have to add more metadata then so be it it's certainly less tedious than putting pragmas and _____ everywhere, some C code just looks like the writings of an insane person.
> […] C optimization tricks are hacks, the fact godbolt exists is proof that C is not meant to be optimizable at all, it is brute force witchcraft.
> At a certain point though, something's gotta give, the compiler can do guesswork, but it should do no more, if you have to add more metadata then so be it it's certainly less tedious than putting pragmas and _____ everywhere, some C code just looks like the writings of an insane person.
There is not even a single correct or factual statement in cited strings of words.
C optimisation is not «hacks» or «witchcraft»; it is built on decades of academic work and formal program analysis: optimisers use data-flow analysis over lattices and fixed points (abstract interpretation) and disciplined intermediate representations such as SSA, and there is academic work on proving that these transformations preserve semantics.
Modern C is also deliberately designed to permit optimisation under the as-if rule, with UB (undefined behaviour) and aliasing rules providing semantic latitude for aggressive transformations. The flip side is non-negotiable: compilers can't «guess» facts they can't prove, and many of the most valuable optimisations require guarantees about aliasing, alignment, loop independence, value ranges, and absence of UB that are often not derivable from arbitrary pointer-heavy C, especially under separate compilation.
That is why constructs such as «restrict», attributes and pragmas exist: they are not insanity, they are explicit semantic promises or cost-model steering that supply information the compiler otherwise must conservatively assume away.
«metadata instead» is the same trade-off in a different wrapper, unless you either trust it (changing the contract) or verify it (reintroducing the hard analysis problem).
Godbolt exists because these optimisations are systematic and comparable, not because optimisation is impossible.
Also, directives are not new, C-specific embarrassment: ALGOL-68 had «pragmats» (the direct ancestor of today’s «pragma» terminology), and PL/I had longstanding in-source compiler control directives, so this mechanism is decades older than and predates modern C tooling.
Bounded strings turned out to be a fairly good idea as well.
There's a blog post from Google about this topic as well where they found that inserting bound checking into standard library functions (in this case C++) had a mere 0.3% negative performance impact on their services: https://security.googleblog.com/2024/11/retrofitting-spatial...
For people using Clang you can read more about libc++ hardening at https://libcxx.llvm.org/Hardening.html
> As local variables are typically hidden from the ABI, this approach has a marginal impact on it.
I'm skeptical this is workable... it's pretty common in systems code to take the address of a local variable and pass it somewhere. Many event libraries implement waiting for an event that way: push a pointer to a futex on the stack to a global list, and block on it.
They address it explicitly later:
> Although simply modifying types of a local variable doesn’t normally impact the ABI, taking the address of such a modified type could create a pointer type that has an ABI mismatch
That breaks a lot of stuff.
The explicit annotations seem like they could have real value for libraries, especially since they can be ifdef'd away. But the general stack variable thing is going to break too much real world code.
I don't understand this example: you're taking an address of local-scope stack object, storing it into a global list, and then use this address elsewhere in the code, possibly at different time-point, to manipulate with the object? I am obviously missing something because this cannot work unless this object lives on the stack of main().
The best example I know of off the top of my head is wait_event() in Linux.
So long as the thread is guaranteed not to exit while blocked, you know its stack, and therefore the object allocated on it, must continue to exist. So, as long as there is no way to wake the thread except by kicking that object, the memory backing it is guaranteed to continue to exist until that object is kicked. You do have to somehow serialize the global data structure lookup (e.g. lock/dequeue/unlock/kick), if multiple threads can find and kick the object concurrently that's unsafe (the thread might exit between the first and subsequent kicks).
Generally that's true, even in pthread userspace: while there are some obvious artificial counterexamples one can construct, real world code very rarely does things like that.
Ok, I see, thanks for the example. Is this technique used to avoid the potential runtime performance cost because one would otherwise need to keep that object elsewhere/heap and not on a stack? Or is the problem definition something else?
It's just mechanically simpler that way. If the wakee thread dynamically allocated the object, it would have to free it after being woken: may as well let the compiler do that automatically for us.
Yep, it's a straight up error in C to return the address of a local variable from a function outside of main. Valgrind will flag this as use of an uninitialized value.
The problem is that as long as it's something where the calling function checks it immediately after the function exits and never looks again (something like an error code or choosing a code path based on the result) they often get away with it, especially in single threaded code.
I'm running into this at this very moment as I'm trying to make my application run cleanly, but some of the libraries are chock full of this pattern. One big offender is the Unix port of Microsoft's ODBC library, at least the Postgre integration piece.
I also blame the Unix standard library for almost having this pattern but not quite. Functions that return some kind of internal state that the programmer is told not to touch. Later they had to add a bunch of _r variants that were thread safe. The standard library functions don't actually have this flaw due to how they define their variables, but from the outside it looks like they do. It makes beginning programmers think that is how the functions should work and write their code in a similar manner.
> Yep, it's a straight up error in C to return the address of a local variable from a function
Sure, that's true, but nobody is suggesting returning the address of a local variable anywhere in this thread.
I'm describing putting a pointer to a local variable in a global data structure, which is safe so long as the function doing it is somehow guaranteed not to return until the pointer is removed from the global data structure.
I would imagine variables that are passed to functions would be considered ABI-visible. If the compiler is smart enough, it can keep the pointer wide when it’s passed to a function that’s also being compiled and act accordingly on the other side, but that worries me because this new meaning of “pointer” is propagating to parts of the code that might not necessarily agree with it.
I want an OS distro where all C code is compiled this way.
OpenBSD maybe? or a fork of CheriBSD?
macOS clang has supported -fbounds-safety for a while, but I"m not sure how extensively it is used.
Maybe this:
https://fil-c.org/pizlix
>Pizlix is LFS (Linux From Scratch) 12.2 with some added components, where userland is compiled with Fil-C. This means you get the most memory safe Linux-like OS currently available.
The author, @pizlonator, is active on HN.
I'm aware of Pizlix - it's a good project/idea that needs to go mainstream; as you mention, memory safety is currently limited to userland (still a huge improvement over traditional unsafe userland.)
Note also that it uses fil-c rather than clang with -fbounds-safety. I believe fil-c requires fewer code changes than -fbounds-safety.
https://github.com/hsaliak/filc-bazel-template i created this recently to make it super easy to get started with fil-c projects. If you find it daunting to get started with the setup in the core distribution and want a 3-4 step approach to building a fil-c enabled binary, then try this.
hot dang that's neato. shame about the name, though.
You need to annotate your program with indications of what variable tracks the size of the allocation. So, sure, but first work on the packages in the distro.
Note that corresponding checks for C++ library containers can be enabled without modifying the source. Google measured some very small overhead (< 0.5% IIRC) so they turned it on in production. But I'd expect an OS distro to be mostly C.
[1] https://libcxx.llvm.org/Hardening.html
Get gentoo, add this to CFLAGS and start fixing everything that breaks. Become a hero.
It is called Solaris, and has this enabled since 2015 on SPARC.
https://docs.oracle.com/en/operating-systems/solaris/oracle-...
Might as well not even talk about anything with the Oracular kiss of death.
Isn’t Illumos and OpenIndiana doing the same?
I still remember someone at Sun commented they treated warnings as errors. This is how software should be developed.
The feature is only on SPARC, not x86. Oracle killed in-house SPARC development in 2017, and they abandoned OpenSPARC after they acquired Sun, so it's effectively a dead architecture. The software won't work without the hardware to run it on.
Fujsitsu also does SPARC, and contrary to HP-UX, people still do buy Solaris.
EDIT:
https://www.oracle.com/servers/sparc/
https://www.fujitsu.com/global/products/computing/servers/un...
Finally, it is up to Intel and AMD to come up with hardware memory tagging, so far they have messed up all attempts, with MPX being the last short lived one.
It's good info, and I wouldn't rush a migration off of SPARC systems if I was already using them, but slow death is still death. It was already worrying that workstations were killed off by Sun before the Oracle acquisition; it seems quite clear that no one has been serious about spreading adoption of the architecture for more than two decades now.
Even Fujitsu has been moving away from SPARC. What was the last SPARC Fujitsu designed?
What matters is that they are still selling them.
Not everyone suffers from Oracle phobia.
Some of us actually do read licenses before using products.
Also the FAANG are hardly any better only because they spew cool marketing stuff like do no evil.
>I want an OS distro where all C code is compiled this way.
You first have to modify "all C code". It's not just a set and forget compiler flag.
Indeed. I still want it.
Fedora and its kernels are built with GCC's _FORTIFY_SOURCE and I've seen modules crash for out of bounds reads.
_FORTIFY_SOURCE is way smaller in scope (as in, closes less vulnerabilities) than -fbounds-safety.
What are you hoping it will achieve?
The internet went down because cloudflare used a bad config... a config parsed by a rust app.
One of these days the witch hunt against C will go away.
The internet didn't go down and you're mischaracterizing it as a parsing issue when the list would've exceeded memory allocation limits. They didn't hardcode a fallback config for that case. What memory safety promise did Rust fail there exactly?
I think the point is memory bugs are only one (small) subset of bugs.
The conventional wisdom is ~70% of serious security bugs are memory safety issues.
https://www.cisa.gov/sites/default/files/2023-12/CSAC_TAC_Re...
Security bugs - and not bad security processes, are a small subset of bugs.
A panic in Rust is easier to diagnose and fix than some error or grabage data that was caused by an out of bounds access in some random place in the call stack
A service going down is a million times better than being exploited by an attacker. If this is a witch hunt then C is an actual witch.
Why can it be exploited? I’ve configured my OS so my process is isolated to the resources it needs.
What language is your OS written in?
It’s written in C I’m glad you asked. Do you have any exploits in the Linux process encapsulation to share?
Surely your not suggesting that the Rust compiler never produces exploitable code?
I probably don’t have such an exploit, since you’re probably running something up to date. There have been many in the past. I doubt the last one to be fixed is the last one to exist.
If your attitude is that getting exploited doesn’t matter because your software is unprivileged, you need some part of your stack to be unexploitable. That’s a tall order if everything is C.
You can get exploitable code out of any compiler. But you’re far more likely to get it from real-world C than real-world Rust.
> you need some part of your stack to be unexploitable.
Kernel level process isolation is extremely robust.
> If your attitude is that getting exploited doesn’t matter because your software is unprivileged
It’s not that exploits doesn’t matter. It’s that process architecture is a stronger form of guarantee than anything provided by a language runtime.
I agree that the place where rust is most beneficial is for programs that must be privileged and that are likely to face attack - such as a web server.
But the idea that you can’t securely use a C program in your stack or that rust magically makes process isolation irrelevant is incorrect.
How can process architecture be a stronger guarantee than anything provided by a language runtime when it is enforced by software written in a language?
You have a process receiving untrusted, potentially malicious input from the outside. If there’s an exploit then an attacker can potentially take control of the process. Your process is isolated, that’s good. But it can still communicate with other parts of your system. It can make syscalls. Now you’re in the same situation where you have a program receiving untrusted, potentially malicious input from the outside, but now “the outside” is your subverted process, and “a program” is the kernel. The same factors that make your program difficult to secure from exploits if it’s written in C also apply to the kernel.
I’m not sure where those ideas as the end of your comment came from. I certainly didn’t say them.
does any distro uses clang? I thought all linux kernels were compiled using gcc.
Chimera does, it also has a FreeBSD userland AFAIU.
https://chimera-linux.org/
hm this one is interesting. Thanks for sharing!
https://www.kernel.org/doc/html/latest/kbuild/llvm.html
> The Linux kernel has always traditionally been compiled with GNU toolchains such as GCC and binutils. Ongoing work has allowed for Clang and LLVM utilities to be used as viable substitutes. Distributions such as Android, ChromeOS, OpenMandriva, and Chimera Linux use Clang built kernels. Google’s and Meta’s datacenter fleets also run kernels built with Clang.
Not a Linux distro, but FreeBSD uses Clang.
And Android uses Clang for its Linux kernel.
-fbounds-safety is not yet available in upstream Clang though:
> NOTE: This is a design document and the feature is not available for users yet.
Very cool. I always wondered why there isn't something like this in GCC/LLVM, it would obviously solve uncountable of security issues.
Xcode (AppleClang) has had -fbounds-safety for a while now. What is the delay getting this into merged into LLVM?
this is amazing, counter to what most ppl think, majority of memory bugs are from out of bounds access, not stuff like forgetting to free a pointer or some such
Personally, as someone in C and C++ for the last few years, memory access is almost never the root bug. It's almost always logic errors. Not accounting for all paths, not handling edge cases, not being able to handle certain combinations of user or file input, etc.
Occasionally an out-of-bounds access pops up, but they're generally so blindingly obvious and easy to fix that it's never been the slow part of bug fixing.
I've been programming for long; the ratio of memory errors to logic bugs in production is so low as to be non-existent.
My last memory error in C code in production was in 2018. Prior to that it I had a memory error in C code in production in 2007 or 2008.
In C++, I eventually gave up trying to ship the same level of quality and left the language altogether.
The wider industry data gathered indicates that for memory unsafe languages 80% of issues are due to memory vulnerabilities, including mature codebases like Linux kernel, curl, V8, Chrome, Mach kernel, qemu etc etc etc. This doesn’t mean that logic bugs are less common, it just means that memory safety issues are the easiest way to get access.
As for why your experience may be different, my hunch is that either your code was super simple OR you didn’t test it thoroughly enough against malicious/unexpected inputs OR you never connected the code to untrusted I/O.
Keep in mind the data for this comes from popular projects that have enough attention to warrant active exploit research by a wide population. This is different from a project you wrote that doesn’t have the same level of attention.
> The wider industry data gathered indicates that for memory unsafe languages 80% of issues are due to memory vulnerabilities, including mature codebases like Linux kernel, curl, V8, Chrome, Mach kernel, qemu etc etc etc.
You are misremembering the various reports - the reports were not that 80%[1] of issues were due to memory errors, but more along the lines of 80% of exploits were due to memory errors.
You could have 1000 bugs, with 10 of them being vulnerabilities, and 8 of those 10 being due to memory errors, and that would still be in line with the reports.
> As for why your experience may be different, my hunch is that either your code was super simple OR you didn’t test it thoroughly enough against malicious/unexpected inputs OR you never connected the code to untrusted I/O.
Payments processing, telecoms and munitions control software.
Of those, your explanation only applies to Telecoms; payments processing (EMV) was basically a constant stream of daily attacks, while munitions are live, in the field, with real explosives. We would've noticed any bugs, not just memory error bugs with the munitions one.
--------------------
[1] The number wasn't 80% IIRC, more like 70%?
Sorry, I didn’t misremember but I wrote down without proof checking (see another comment where I got it right). I did indeed mean 80% of security vulnerabilities are caused by memory safety issues.
For EMV you had C connected directly to the network under a steady stream of attacks and only had an issue once? I find that hard to believe. What’s more likely is a Java websever frontend talking to some C processing / crypto code in which case again you’re less likely to encounter bugs in your code because it’s difficult to find a path to injecting unsanitized input.
For munitions there’s not generally I/O with uncontrolled input so it’s less likely you’d find cases where you didn’t properly sanitize inputs and relied on an untrusted length to access a buffer. As a famous quote states, it’s ok if your code has an uptime of 3 minutes until the first bug if the bomb explodes in 2
> For EMV you had C connected directly to the network under a steady stream of attacks and only had an issue once? I find that hard to believe. What’s more likely is a Java websever frontend talking to some C processing / crypto
EMV terminals. No Java involved.
> As a famous quote states, it’s ok if your code has an uptime of 3 minutes until the first bug if the bomb explodes in 2
Look, first you commented that it's not possible for nontrivial or non-networked devices, now you're trivialising code that, if wrong, directly killed people!
All through the 80s, 90s and 2000s (and even now, believe it or not), the world was filled with millions and millions of devices programmed in C, and yet you did not live a life where all the devices around you routinely crashed.
Crs, Microwaves, security systems... they didn't routinely crash even though they were written in C.
EMV terminals are not under daily cybersecurity attack - you need to have physical access unless you designed your system weirdly. You probably had loads of vulnerabilities. But also depending on when you did it, all you had to process was a bar code which is also isn’t some super complicated task.
I’m not trivializing the safety of munitions. I’m attempting to highlight that safety and stability in a munitions context is very different and memory safety issues could easily exist without you realizing. My overall point is that you are silently making the argument that C programmers (or programmers in general) used to be better, which is a wild argument to be making about a culture in which fuzzing didn’t even exist as a concept. You’re also confusing memory safety with implying a crash. That simply isn’t the case - it’s more often exploitable as a security vulnerability than an immediate crash by violating assumptions made that weren’t in the happy path of those microwaves and security systems. That millions of devices were and still are routinely exploitable.
You’re also making a fallacious line of reasoning that the C today is the same C that was in use in the 80s, 90s, and 2000s. It’s not and has gotten harder and more dangerous because a) multi threading became more of a thing and b) compiler authors started exploiting “undefined” behaviors for extra optimization.
It’s just wild for me to encounter someone who believes C is a safe language and is suitable to connect to I/O too when there’s so many anecdotal and wide statistical evidence gathered that that’s not the case. Even SQLite, the darling of the C community, is not safe if asked to open arbitrary SQLite files - there’s various security attacks known and possible.
> EMV terminals are not under daily cybersecurity attack - you need to have physical access unless you designed your system weirdly.
They are under daily attack - in public, at tills, operated by minimum-wage earners.
> You probably had loads of vulnerabilities.
Sure. Hundreds of thousands of terminals sitting in the field, networked, under the control of minimum wage employees, each holding credit card details for hundreds of cards at a time...
Yeah, you're right, not a target at all!
> But also depending on when you did it, all you had to process was a bar code which is also isn’t some super complicated task.
You are hopelessly naive. Even in the magstripe era, certification was not easy.
> It’s just wild for me to encounter someone who believes C is a safe language
When did you meet this person?
Look, the bottom line is, the errors due to memory safety in programs written in C is so small it's a rounding error. It's not even statistical noise. You spent your life surrounded by these programs that, if they went wrong, would kill you, and yet here you are, not only arguing from a place of ignorance, you are reveling in it.
Just out of interest, have you ever used an LLM to write code for you?
Physical attacks are difficult to pull off at scale, especially anonymously. There’s a huge evidence trail linking the people involved to the scheme. And a device being in the hands of a minimum wage employee is very different from a bored and talented and highly skilled person probing your software remotely. Now who’s naive?
As for certification and it being difficult, what does that have to do with the process of bread in Paris? Unless you’re somehow equating certification with a stamp of vulnerability imperviousness in which case you’re seeing your own naivete instead of in others. Btw, Target was fully certified and fully had their payment system breached. Not through the terminals but through the PoS backend. And as for “but you’re here living and breathing”, there’s constant security breaches through whatever hole, memory safety or otherwise. Persistent access into the network is generally only obtainable through credential compromise or memory safety.
> When did you meet this person?
You. You’re here claiming that memory safety issues are statistical noise yet every cloud software I’ve seen deployed regularly had them in the field, sometimes even letting a bad one through to canary. And memory safety issues persisted despite repeated attempts to fix issues and you couldn’t even know if it was legitimately an issue or just a HW flaw due to being deployed at scale enough that you were observing bad components. It’s a real problem and claiming it’s statistical noise ignores the consequences of even one such issue being easily accessible.
> You. You’re here claiming that memory safety issues are statistical noise yet
Claiming that the exploit rate percentage is statistical noise is different from claiming that it's a safe language.
Looks like you have a premade argument to argue.
You haven't answered my question, though: Have you used LLMs to generate any code for yourself?
Yes. The problem is that most memory errors (out of bounds + use after free etc.) result in a vulnerability. Only a minority of the logic errors do.
For operating systems kernels, browsers etc, vulnerabilities have a much, much bigger impact than logic errors: vulnerabilities need to be fixed immediately, and released immediately. Most logic errors don't need to be fixed immediately (sure, it depends on the issue, and on the type of software.)
I would probably say "for memory unsafe languages, 80% of the _impact_ is due to memory vulnerabilities"
[dead]
logic errors aren't memory errors, unless you have some complex piece of logic for deallocating resources, which, yeah, is always tricky and should just generally be avoided
"Majority" could mean a few things; I wouldn't be surprised if the majority of discovered memory bugs are spatial, but I'd expect the majority of widely exploited memory bugs to be temporal (or pseudo-temporal, like type confusions).
I think UAFs are more common in mature software
Or type confusion bugs, or any other stuff that stems from complex logic having complex bugs.
Boundary checking for array indexing is table stakes.
table stakes, but people still mess up on it constantly. The "yeah, but that's only a problem if you're an idiot" approach to this kind of thing hasn't served us very well so it's good to see something actually being done.
Trains shouldn't collide if the driver is correctly observing the signals, that's table stakes too. But rather than exclusively focussing on improving track to reduce derailments we also install train protection systems that automatically intervene when the driver does miss a signal. Cause that happens a lot more than a derailment. Even though "pay attention, see red signal? stop!" is conceptually super easy.
I'm not saying it's not important, it is. I just don't believe that '[the] majority of memory bugs are from out of bounds access'. That was maybe true 20 years ago, when an unbounded strcpy to an unprotected return pointer on the stack was super common and exploiting this kind of vulnerabilities what most vulndev was.
This brings C one tiny step closer to the state of the art, which is commendable, but I don't believe codebases which start using this will reduce their published vulnerability count significantly. Making use of this requires effort and diligence, and I believe most codebases that can expend such effort already have a pretty good security track record.
The majority of security vulnerabilities in languages like C that aren’t memory safe are due to memory safety issues like UAF, buffer overflows etc etc. I don’t think I’ve seen finer grained research that tries to break it out by class of memory safety issue. The data is something like 80% of reported vulnerabilities in code written in these languages are due to memory safety issues. This doesn’t mean there aren’t other issues. It just means that it’s the cheapest exploit to search for when you are trying to break into a C/C++ service.
And in terms of how easy it is to convert a memory safety issue into an exploit, it’s not meaningfully much harder. The harder pieces are when sandboxing comes into play so that for example exploiting V8 doesn’t give you arbitrary broader access if the compromised process is itself sandboxed.
There is use after free
Majority. Parent said majority
Exactly. Use after free is common enough that you can't just assert that out-of-bounds is the majority without evidence.
actually you may be right, according to project zero by google [1], ~50% is use after free and only ~20% for out of bounds errors, however, this is for errors that resulted in major exploits, i'm not sure what the overall data is
[1] https://projectzero.google/2022/04/the-more-you-know-more-yo...
Exciting! It doesn't imply that we should now sprinkle the new annotations everywhere. We still should keep working with proper iterators and robust data structures, and those would need to add such annotations.
> To tackle this issue, the model incorporates the concept of a “wide pointer” (a.k.a. fat pointer) – a larger pointer that carries bounds information alongside the pointer value.
Bounds checking with fat pointers existed as a set of patches for GCC in the early 2000's. (C front end only).
https://sourceforge.net/projects/boundschecking/
Amazing, this is a life saving feature for C developers. Apparently it's not complete yet? I will apply this to my code once the feature is included on LLVM and GCC.
Would be nice if the annotations could also be applied to structure fields.
Clang has this and upcoming GCC will also have this: https://godbolt.org/z/KETrPEnT1
This is awesome!!
counted_by for struct fields actually is actually the part that afaik works today: https://embeddedor.com/blog/2024/06/18/how-to-use-the-new-co...
That's amazing. Thanks for that reference. If it's good enough for the kernel, then it's good enough for me to start using in my own projects.
It's really cool that the kernel is using this. The compiler must be generating simple bounds checking code with traps instead of crazy stuff involving magical C standard library functions. Perfect for freestanding nostdlib projects.
Or just do it in C.
https://godbolt.org/z/TvxseshGcStill requires a gcc/clang specific extension (although this one I'd be very happy to see standardized)
Only statement expressions, but one can also implement this without them.
Still, please standardize them :).
I don't use much C but if you add them to the standard they'll probably trickle down to C++ compilers by 2045 and I'll have a good 10 years to use them before I retire.
You are not happy with immediately invoked lambda expressions?
Well, I'll take them over the nothing you're giving me :D
But in all seriousness, I want this:
Can't be done with lambdas since the macro needs to return out of the actual function, not the lambda. Pretty much rust question mark, but without rust, or zig "try" but without zig.I see, thanks! There is general consensus that statement expressions should become part of ISO C, but some lack of time to get it done. I am not part of WG21 though, so can't say anything about C++.
The fact that pointer types can't be used with this pattern without typedef still seems kinda primitive to me.
You can use pointer types by using a typedef first, but I agree this not nice (I hope we will fix this in future C). But then, I think this is a minor inconvenience for having an otherwise working span type in C.
The extension is for hardening legacy C code without breaking ABI.
Even better, starting with C++26, and considered to be done with DR for previous versions, hardned runtimes now have a portable way to be configured across compilers, instead of each having their own approach.
However, you still need something like -fbounds-safety in C++, due to the copy-paste compatibility with C, and too many people writing Orthodox C++, C with Classes, Better C, kind of code, that we cannot get rid of.
I'm sure std::span is great, but I like mine better :)
I find it a bit hard to justify using the STL when a single <unordered_map> include costs 250ms compile time per compile unit.
The fact that I don't have to step through this in the debugger is also a bonus:
Only if not able to do import std, or pre-compiled headers, and not using modern IDEs with "just my code" filters.
As someone that enjoys C++ since 1993, alongside other ecosystems, many pain points on using C++ complaints are self inflicted, by avoiding using modern tools.
Heck, C++ had nice .NET and Java alike frameworks, with bounds checking even, before those two systems came to exist, and nowadays all those frameworks are mostly gone with exception of Qt and C++ Builder ones, due to bias.
[dead]
You should tell the LLVM folks, I guess they didn't know about this.
and if you write directly in assembly you don't even need a C++ compiler
That's an objectively correct statement, but I don't see how it makes sense as a response to my comment, as I'm advocating to use the more advanced feature-rich tool over the compiler-specific-hacks one.
If you're advocating switching languages, then there's no reason to stop at C++. It's more common to propose just converting the universe to Rust, but assembly also enjoys the possibility of being fairly easy to drop in on an existing C project.
> I don't see how it makes sense as a response to my comment
Your comment started out with "just."
As if there are never any compelling reasons to want to make existing C code better.
But instead of taking that as an opportunity to reflect on when various tools might be appropriate,
> as I'm advocating to use the more advanced feature-rich tool over the compiler-specific-hacks one.
You've simply doubled down.
[dead]
I looked at trying to implement -fbounds-safety and -Wunsafe-buffer on a reasonably large codebase (4,000 C and C++ files), and it's basically impossible.
You have to instrument every single file. It can be done in stages though. Just turn the flag on one-by-one for each file. The xnu kernel is _mostly_ instrumented with -fbounds-safety.
Plug: In theory you could auto-convert to a memory-safe subset of C++ as a build step. Auto-converted code would have some run-time overhead, but you can mark any performance-sensitive parts of the code to be exempt from conversion. And you get lifetime and type safety too. For full coverage, performance-sensitive parts of the code can be manually converted to the safe subset to minimize overhead. (Interfaces in extern C blocks remain unconverted by default to maintain ABI compatibility.)
[1]: https://duneroadrunner.github.io/scpp_articles/PoC_autotrans...
This sounds like the kind of low-thought pattern-based repetitive task where you could tell an LLM to do it and almost certainly expect a fully correct result (and for it to find some bugs along the way), especially if there's some test coverage for it to verify itself against. If you're skeptical, you could tell it to do it on some files you've already converted by hand and compare the results. This kind of thing was a slam dunk for an LLM even a year or two ago.
There is GWPAsan that has lower overhead than asan but still is not super popular.
Because it can only catch a subset of issues, it’s not guaranteed to catch issues (probabilistic), even issues it “could” catch may not be caught due to temporal distance of the free and a subsequent use, and requires the use of a different allocator that supports it. It’s also unclear to me how it know whether a given free is for a sampled or unsampled region - I suspect it must capture all free/realloc to accomplish that but it does imply all of these are sampled.
It’s nowhere near the same as robust bounds checking.
ASAN/LSAN is amazing. It absolutely monkey-hammers performance though.
> ASAN/LSAN is amazing. It absolutely monkey-hammers performance though.
It's not so bad; until the sanitisers arrived all we had was valgrind :-/
The sanitisers are about 10x to 50x faster than valgrind.
[dead]
The real question is adoption friction. The annotation requirement means this won't just slot into existing codebases — someone has to go through and mark up every buffer relationship. Google turning on libcxx hardening in production with <0.5% overhead is compelling precisely because it required zero source changes.
The incremental path matters more than the theoretical coverage. I'd love to see benchmarks on a real project — how many annotations per KLOC, and what % of OOB bugs it actually catches in practice vs. what ASAN already finds in CI.
The WebKit folks have apparently been very successful with the annotations approach[0]. It's a shame that a few of the loudest folks in WG21 have decided that C++ already has the exact right number of viral annotations already, and that the language couldn't possibly survive this approach being standardized.
[0]https://www.youtube.com/watch?v=RLw13wLM5Ko