I'm building a clarity-first language (compiles to C++)

(github.com)

35 points | by hedayet 5 days ago ago

47 comments

HendrikHensen 16 hours ago
If this is to be a real, (relatively) widely-used language, I would make some tough choices on where to innovate, and where to just leave things the same.
One thing I noticed in the example is `num target`, especially because the focus is on "clarity". When I read the example, I was sure that `num` would be something like the JavaScript `Number` type. But to my surprise, it's just a 64-bit integer.
For an extremely long time, languages have had "int", "integer", "int64", and similar. If you aim for clarity, I would strongly advise to just keep those names and don't try to invent new words for them just because. Both because of familiarity (most programmers coming to your language will already be familiar other languages which have "int(eger)"), and because of clarity ("int(eger)" is unambiguous, it is a well defined term to mean a round number; "num" is ambiguous and "number" can mean any type of number, e.g. integer, decimal, imaginary, complex, etc).
The most clear are when the data types are fully explicit, eg. `int64` (signed), `uint64` (unsigned), `int32`, etc.
[-]
- hedayet 16 hours ago
  [author here] That’s a good point. I can see why int might be clearer than num, especially given the long history of that naming. I’ll think about it.
  [-]
  - Nevermark 15 hours ago
    Definitely int for signed numbers. But I would call it "int64".
    Clarity means saying what you mean. The typename int64 could not be clearer that you are getting 64 bits.
    This is consistent with your (num32 -->) "int32".
    And it would remain consistent if you later add smaller or larger integers.
    This also fits your philosophy of letting the developer decide and getting out of their way. I.e. don't use naming to somehow shoehorn in the "standard" int size. Even if you would often be right. Make/let the developer make a conscious decision.
    Later, "int" could be a big integer, with no bit limit. Or the name will be available for someone else to create that.
    I do like your approach.
    (For unsigned, I would call them "nat32", "nat64", if you ever go there. I.e. unsigned int is actually an oxymoron. A sign is what defines an integer. Natural numbers are the unsigned ones. This would be a case of using the standard math term for its standard meaning, instead of the odd historical accident found in C. Math is more universal, has more lasting and careful terminology - befitting universal clarity. I am not a fan of new names for things for specialized contexts. It just adds confusion or distance between branches of knowledge for no reason. Just a thought.)
  - jiggawatts 14 hours ago
    I would recommend outright copying Rust.
    Among other things, it's a systems programming language and hence its naming scheme is largely (if not entirely) compatible with modern C++ types.
    I.e.:
```
    +----------------+-------------------------+------------------------------+
    | Rust           | Modern C++              | Notes                        |
    +----------------+-------------------------+------------------------------+
    | i8             | std::int8_t             | exact 8-bit signed           |
    | u8             | std::uint8_t            | exact 8-bit unsigned         |
    | i16            | std::int16_t            | exact 16-bit signed          |
    | u16            | std::uint16_t           | exact 16-bit unsigned        |
    | i32            | std::int32_t            | exact 32-bit signed          |
    | u32            | std::uint32_t           | exact 32-bit unsigned        |
    | i64            | std::int64_t            | exact 64-bit signed          |
    | u64            | std::uint64_t           | exact 64-bit unsigned        |
    | i128           | (no standard type)      | GCC/Clang: __int128          |
    | u128           | (no standard type)      | GCC/Clang: unsigned __int128 |
    | isize          | std::intptr_t           | pointer-sized signed         |
    | usize          | std::uintptr_t          | pointer-sized unsigned       |
    | f32            | float                   | IEEE-754 single precision    |
    | f64            | double                  | IEEE-754 double precision    |
    | bool           | bool                    | same semantics               |
    | char           | char32_t                | Unicode scalar value         |
    +----------------+-------------------------+------------------------------+
```
amluto 18 hours ago
> 3. Values follow a strict rule: primitives pass by value, containers pass by read-only reference. This prevents accidental aliasing/mutation across scopes and keeps ownership implicit but predictable.
There are plenty of languages where functions cannot mutate their parameters or anything their parameters reference — Haskell is one example. But these languages tend to have the ability to (reasonably) efficiently make copies of most of a data structure so that you can, for example, take a list as a parameter and return that list with one element changed. These are called persistent data structures.
Are you planning to add this as a first-class feature? This might be complex to implement efficiently on top of C++’s object model — there’s usually a very specialized GC involved.
[-]
- hedayet 17 hours ago
  [author here] ROX avoids implicit structural sharing and persistent data structures. Allocation and mutation are explicit - if I want a modified container, I construct one.
  This is intentionally more resource-intensive. ROX trades some efficiency for simplicity and predictability.
  The goal is clarity of logic and clarity of behavior, even at slightly higher cost. And future optimizations should preserve that model rather than hide it.
  [-]
  - amluto 9 hours ago
    I would check your “slightly”. If I have an algorithm that operates on an n-element data structure through a helper function (this includes almost any nontrivial program - think managing caches, small databases or tables, lists of clients, etc), you get an extra multiplicative factor of n for each operation. All those nice linear-time or n-log-n algorithms turn into n^2, and accidentally quadratic programs can be bad news.
    And if the language offers no facility at all to get the factor of n back, users may be forced to use something else.
- cyber_kinetist 17 hours ago
  I've seen some C++ libraries that implement persistent data structures like immer (https://github.com/arximboldi/immer) - but seems it requires the use of the Boehm GC (which is notorious to be slow, since it is a conservative GC and cannot exploit any of the specific semantics/runtime characteristics of the language you're making).
- pjmlp 14 hours ago
  Brainstorming a bit, you could get into that via hazardous or deferred pointers, but yeah I guess it falls down into specialized GC kind of solution.
- eager_learner 18 hours ago
  Comments like amluto's above, are the reason my time spent on HN is not wasted.
- paulddraper 18 hours ago
  I don’t see the relevance of special GC.
  But yes you need immutable data structures designed for amortized efficient copy.
nynx 17 hours ago
This is an interesting line in the readme:
> The language forces clarity — not ceremony.
I find this statement curious because a language, like this, without ability to build abstractions forces exactly the opposite.
[-]
- saghm 16 hours ago
  Yeah, this seems to be a common thing nowadays, although often with the value cited as "simplicity". I've always found it a bit odd because it seems to me like there are tradeoffs where making things at one level of granularity more clear or simple (or whatever you want to call it) will come at the cost of making things less clear and simple if you zoom in or out a bit at what the code is doing. Assembly is more "clear" in terms of what the processor is doing, but it makes the overall control flow and logic of a program less clear than a higher level language. Explicitly defining when memory is allocated and freed makes the performance characteristics of a program more clear, but it's "ceremony" compared to a garbage collected language that doesn't require manually handling that by default.
  I think my fundamental issue with this sort of prioritization is that I think that there's a lot of value in being able to jump between different mental models of a program, and whether something is clear or absolutely ridden with "ceremony" can be drastically different depending on those models. By optimizing for exactly one model, you're making programs written in that language harder to think about in pretty much every other model while quickly hitting diminishing returns on how useful it is to try to make that one level of granularity even more clear. This is especially problematic when trying to debug or optimize programs after the initial work to write them is complete; having it be super clear what each individual line of code is doing isolation might not be enough to help me ensure that my overall architecture isn't flawed, and similarly having a bunch of great high-level abstractions won't necessarily help me notice bugs that can live entirely in one line of code.
  I don't think these are specific use cases that a language can just consider to be outside of the scope in the same way they might choose not to support systems programming or DSLs or whatever; programmers need to be able to translate the ideas of how the program works into code and then diff between them to identify issues at both a macro and micro level regardless of what types of programs they're working on.
- hedayet 16 hours ago
  [author here] That’s a very good point - "not ceremony" was poorly phrased.
  ROX does introduce more explicitness, which indeed introduces more ceremony. The goal isn’t to reduce keystrokes; it’s to reduce hidden behaviour.
  A better framing would be: ROX prioritizes clarity over convenience. Explicitness may cost more keystrokes, but it eliminates hidden behavior. [README updated]
- raverbashing 15 hours ago
  Yup exactly this
  It's the "C is a simple language" BS again
  Using a circular sawblade without the saw is as simple as it gets as well
  The simpler it is the more you get annoyed at it, the more it is easier to shoot yourself in the foot with it, because the world is not perfect
  Abstractions are great and I'm dying on this hill
  "getError" what year is it again?
sesm 18 hours ago
> Lists are accessed only via .at()
If clarity is the goal, then data structures that support access by index should be called `arrays` or `vectors`
[-]
- hedayet 17 hours ago
  [author here] What @joshuamorton said + my rationale was - for natural language users too, a "list" should be more intuitive than `array` or `vector`
  I'm more than happy to be corrected though.
  [-]
  - anon291 16 hours ago
    The idea of being inspired by natural language is completely at odds with also desiring clarity first
- rkeene2 17 hours ago
  Also why num/num32 for Integer types, and no floating point type
  [-]
  - hedayet 17 hours ago
    [author here] Very good questions; I definitely would like to revisit num32 very shortly. I'd say the initial rational of having num32 is not coherent right now, but I'll have to verify removing the support.
    we have floating point type(It was missing from the type list in readme. I have just updated that seeing this comment. thank you!)
    [-]
    - teiferer 16 hours ago
      Well, clarity would be achieved with a name like u64. Is num signed? What's the range? Is it integers or floating point? All these things are hidden. With u64 there would be no questions open. (Well a few maybe, like overflow behavior, but can't have it all..)
    - rurban 13 hours ago
      There cannot be any num32. num is a number, which can be fixed size integers, floating point numbers (of fixed size or not) or bigints. Some also add decimals
      num32 being i32 or f32 makes no sense
- joshuamorton 17 hours ago
  This is very language dependent. People coming from python or Java would call them lists.
  Vectors are a mathematical concept unless you use c++.
  [-]
  - vlovich123 16 hours ago
    In Java it’s called Vector / list refers to linked list. Python doesn’t have a linked list type so it’s kinda irrelevant. But also not every language has to be Algol centric even though Algol has largely dominated the design space of popular languages due to familiarity.
    [-]
    - brabel 16 hours ago
      > In Java it’s called Vector / list refers to linked list
      What?!! No! Vector is almost never used in Java code. When you need index-based access, ArrayList is the much more common one, and it does implement List. So I would agree with parent commenter that List is the equivalent in Java.
      A List in Java is a container that allows iterating over items efficiently, but does not necessarily provide efficient random access: https://docs.oracle.com/javase/8/docs/api/java/util/List.htm...
      If you care about why Vector is nearly never used: it is synchronized by default, making it slower and more complex than ArrayList. Most Java programmers would prefer to implement synchronization themselves in case multi-threading is required since it nearly always involves having to synchronize multiple list operations at the same time, which cannot be done with Vector.
      It's the same reason no one uses StringBuffer, but StringBuilder.
      [-]
      - vlovich123 4 hours ago
        I’m aware. The preferred name is clearly Vector and ArrayList is the wordy alternative they had to use to not break back compat. List is the name to the interface which encompasses a lot of different data types. But ask a Java programmer of they prefer to use arrays or lists and I suspect they won’t even think twice about understand that the list you’re referring to is a linked list rather than asking what you mean because arraylist implements list. A fun Java weirdness is that of course arrays themselves do not implement list
Someone 11 hours ago
> Nothing implicit happens behind your back.
And then, we have
```
  repeat i in range(0, n, 1)
```
with an implicit “i = i + n”, followed by an implicit comparison.
Also, range(0, n, 1) looks like a function call, but isn’t one, is it? Or an i write
```
  evenDigits = range(0, 10, 2)
```
I think introducing a range type would be cleanest, but I think clarity would improve even more by giving up more on “Nothing implicit happens behind your back”. Allowing
```
  for num in nums {…}
```
has quite a bit of magic, but IMO is clearer than iterating over indexes, followed by superfluous checks whether indexing succeeded.
norman784 14 hours ago
Most of what the section "Why ROX exists" reminds me of Rust and Zig, where both are more explicit (but Zig even more where there aren't hidden allocations, while Rust hides it).
Said that I really miss all the i{8|16|32|64|128|size}, u{8|16|32|64|128|size} and f{32|64} in other languages, I think having num and num32 is a mistake (IMHO) and naming like them like Rust/Zig provides more clarity (and it's also more concise).
For the "repeat" keyword I find it odd, because most other languages uses "for" but I can understand the reason and might be able to get used to it.
Otherwise I find always interesting new programming languages, what new they bring to the table and the lessons learned from other PLs.
[-]
- Someone 9 hours ago
  > I think having num and num32 is a mistake
  There still are lots of 32-, 16- and even 8-bits CPUs in active use. For those portable code that uses machine integers can make sense (and yes, it is harder to write because the integer limits will vary with the integer type used, but the added performance can be worth that).
ivanjermakov 14 hours ago
> In ROX: <...> Nothing implicit happens behind your back.
> You write the logic. The language stays out of the way.
Writing business logic and everything being explicit are polar opposite. For the programming language to stay out of the way it should more resemble concise version on English with little to no language constructs.
Knork-and-Fife 16 hours ago
If the only loop is `repeat i in range(start, end, step)` , how do you do a loop like "Keep reading from a buffer until it's empty"? I.e. any loop when you can't know the number of iterations needed when the loop starts?
[-]
- hedayet 15 hours ago
  yes, support for unbounded loop is definitely something on my roadmap towards v1.
  [-]
  - derriz 12 hours ago
    Should be a priority. Without it, your language relies on recursion for Turing completeness.
leecommamichael 17 hours ago
I’d be curious to hear the author’s thoughts on Odin. Odin seems to have meet many of the same goals as ROX. I am not implying the author shouldn’t keep going with their language.
ad_hockey 16 hours ago
This looks interesting! As a Go user I definitely see the value in boring but predictable languages. Does Rox have any support for concurrency?
skottenborg 16 hours ago
Though it may be somewhat annoying, I think forcing named parameters would be a big win for clarity. Is this something you have considered?
dusanstanojevic 5 days ago
Very interesting, I've read your readme and your core principles really resonate with me. How is memory managed?
[-]
- hedayet 5 days ago
  Great question - we keep memory management intentionally simple.
  1. There’s no manual memory management exposed at the language level (no pointers, no allocation APIs). I intend to keep it this way as long as possible.
  2. Containers (list[T], dictionary[K,V]) compile directly to C++ STL types (std::vector, std::unordered_map).
  3. Values follow a strict rule: primitives pass by value, containers pass by read-only reference. This prevents accidental aliasing/mutation across scopes and keeps ownership implicit but predictable.
  Anything created in a scope is destroyed when that scope ends (standard C++ RAII). So in practice, memory management in Rox is C++ lifetime semantics underneath, but with a stricter surface language to reduce accidental complexity.
  [-]
  - teraflop 19 hours ago
    That sounds like it's basically impossible to implement your own non-trivial data structures. You can only use the ones that are already in the standard library.
    For instance, how would you represent a binary tree? What would the type of a node be? How would I write an "insert node" function, which requires that the newly-created node continues to exist after the function returns?
    I'm not necessarily saying that this makes your language bad, but it seems to me that the scope of things that can be implemented is much much smaller than C++.
bsaul 14 hours ago
sidenote : are we going to still see new languages appear after AI becomes the one that writes the code ?
I'd say that for a new language to appear in that new world, it would need to offer new compile-time properties that AI could benefit from. Something like expressing general program properties / invariants that the compiler could check and the AI could iterate on.
OsrsNeedsf2P 18 hours ago
This is great. I look forwards to more "strict" languages whose deterministic compilers will give LLMs a tight feedback loop to catch bugs.
anon291 16 hours ago
The best imperative language is Haskell do notation which offers everything you support here.
Panzerschrek 18 hours ago
Does it have destructors?
[-]
- hedayet 17 hours ago
  [author here] as of today - no. I'm super keen to keep the concept of destructors and GC hidden from this language interface.
  [-]
  - 16 hours ago
    [deleted]
tovej 16 hours ago
You have to unwrap every array access? That does not feel clear to me at all. Also this would make every hot loop slower.
The amount of safety features here seems excessive. The language is stricter than Rust. It's not very "clear" either. For some reason the author has decided to rename concepts that are familiar to programmers, making it more didficult to switch to for experienced programmers (repeat instead of for, num instead of... float I assume?), but the langauge isn't really beginner-friendly either, due to the strict semantics.
This feels like vibe-coded slop. Why is this on the front page? HN has fallen off.
globalnode 15 hours ago
one thing ive always wondered about these type of projects is how do you debug them? gdb at runtime? printf statements? i mean when im debugging python i mainly rely on print() and log files so i guess that would work. its been a long time since i used an ide to step through a program anyway, i think way back with borland turbo c/c++ i used to step through statements to see how it all worked but things are much too complex for that now.